WO2016026446A1 - 智能摄像***的实现方法、智能摄像***和网络摄像头 - Google Patents

智能摄像***的实现方法、智能摄像***和网络摄像头 Download PDF

Info

Publication number
WO2016026446A1
WO2016026446A1 PCT/CN2015/087559 CN2015087559W WO2016026446A1 WO 2016026446 A1 WO2016026446 A1 WO 2016026446A1 CN 2015087559 W CN2015087559 W CN 2015087559W WO 2016026446 A1 WO2016026446 A1 WO 2016026446A1
Authority
WO
WIPO (PCT)
Prior art keywords
webcam
audio data
server
module
processing
Prior art date
Application number
PCT/CN2015/087559
Other languages
English (en)
French (fr)
Inventor
沈海寅
房文新
王禾丰
Original Assignee
北京奇虎科技有限公司
奇智软件(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京奇虎科技有限公司, 奇智软件(北京)有限公司 filed Critical 北京奇虎科技有限公司
Publication of WO2016026446A1 publication Critical patent/WO2016026446A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Definitions

  • the invention relates to a video monitoring technology, in particular to an implementation method of an intelligent camera system, a smart camera system and a web camera.
  • Some existing cameras can be connected to the server through the network, and a server can connect a large number of cameras, and the user can retrieve and view the images taken by the camera through the server.
  • a camera can be called a webcam.
  • the interaction between an existing webcam and a user or server typically includes the following two types:
  • the webcam informs the user of the current state of the webcam through its indicator light or buzzer.
  • the status of the webcam usually includes: online, offline, startup, viewing, abnormal alarm, and crash; for example, for the Dropcam camera, the long blue light indicates that the camera is currently online, the long green light indicates that the camera is offline, and the blue light is blinking. Indicates that someone is viewing the camera through the server. The red light flashes to indicate that the camera itself is abnormal and the alarm is on.
  • the long red light indicates that the camera is currently in a crash state; for example, the beep 1 indicates that the camera is currently online, and the beep 2 indicates that the camera is currently offline. , Beep 3 indicates that the camera is currently starting.
  • the webcam notifies the server that it is online and the webcam transmits the video image it ingested to the server at the request of the server.
  • the webcam relies on the color of the indicator light, the blinking speed, the brightness, the type of the buzzer synthesized sound, the buzzing speed, and the size of the buzzer sound, and the information expressed is very limited; and the webcam
  • the information provided to the server is relatively simple; thus, the degree of intelligence of the existing camera system needs to be further improved.
  • the present invention has been made in order to provide an implementation method of a smart camera system, an intelligent camera system, and a webcam that overcome the above problems or at least partially solve the above problems.
  • a method for implementing an intelligent camera system includes: collecting, by a network camera, audio data of an environment in which it is located in a video surveillance state, and performing voice recognition on the audio data collected by the network camera; Extracting keywords from the speech recognition result;
  • the webcam sends a processing request carrying the identification information of the webcam and the basic data to the designated server, the basic data including at least one of the keyword, the audio data, and the video data.
  • the designated server generates a processing response according to the basic data in the received processing request, and performs information interaction with the corresponding webcam corresponding to the user smart terminal device and/or the identification information of the web camera based on the processing response.
  • an intelligent camera system includes: an acquisition module, disposed in a webcam, adapted to collect audio data of an environment in which the webcam is located when the webcam is in a video surveillance state;
  • a voice recognition module is disposed in the network camera and is adapted to perform voice recognition on the audio data collected by the acquisition module;
  • the extraction module is disposed in the network camera and is adapted to extract keywords from the voice recognition result;
  • the request module is set in the network
  • the camera is adapted to, when the extracted keyword belongs to a predetermined keyword, send a processing request carrying the identification information of the web camera and the basic data to the designated server, where the basic data includes: the keyword At least one of audio data and video data;
  • the processing module is disposed in the designated server, and is adapted to generate a processing response according to the basic data in the processing request received by the specified server, and execute and the corresponding user intelligence based on the processing response Terminal device and/or the webcam
  • the information of the webcam corresponding to the identification information interacts.
  • a webcam mainly includes: an acquisition module, configured to collect audio data of an environment where the webcam is located when the webcam is in a video surveillance state; and the first voice recognition module is adapted to Performing voice recognition on the audio data collected by the acquisition module; the extraction module is adapted to extract keywords from the voice recognition result; and the requesting module is adapted to send the specified keyword to the designated server if the extracted keyword belongs to a predetermined keyword Carrying the identification information of the webcam and the processing request of the basic data, so that the designated server generates a processing response according to the basic data in the received processing request, and performs execution with the corresponding user intelligent terminal device and/or based on the processing response.
  • the information of the webcam corresponding to the identifier information of the webcam interacts, and the basic data includes at least one of the keyword, the audio data, and the video data.
  • a computer program comprising computer readable code that, when executed on a computing device, causes the computing device to perform any of the above-described smart camera systems Implementation.
  • a computer readable medium wherein the computer program described above is stored.
  • the implementation method of the intelligent camera system, the intelligent camera system and the network camera collect audio data through the network camera and perform voice recognition on the collected audio data, and send a corresponding processing request to the server based on the voice recognition result, so that the server can Handling the underlying data and user intelligence in the request
  • the end device and the webcam perform corresponding information interaction, for example, the server connects the user smart terminal device with the web camera, so that the network call can be implemented between the user smart terminal device and the web camera, and for example, the server returns the user to the web camera for the query.
  • the audio data of the information is played by the webcam, and the audio data is played by the webcam; thus, the embodiment of the invention improves the information interaction capability of the webcam, thereby improving the intelligence of the smart camera system.
  • FIG. 1 is a flow chart showing an implementation method of an intelligent camera system according to Embodiment 1 of the present invention
  • FIG. 2 is a schematic diagram of an intelligent camera system including a specific structure of a webcam according to a second embodiment of the present invention
  • FIG. 3 is a block diagram schematically showing a computing device for performing an implementation method of a smart camera system according to the present invention
  • Fig. 4 schematically shows a storage unit for holding or carrying program code implementing an implementation method of the smart camera system according to the present invention.
  • Embodiment 1 The implementation method of the intelligent camera system.
  • the smart camera system in this embodiment mainly comprises: a server and a webcam, and one server is respectively connected with one or more web cameras; for example, the webcam is connected by wireless means (for example, WIFI, wireless fidelity, one can personalize A computer, a handheld device, and the like are connected to each other in a wireless manner.
  • the webcam can also be connected to the server through a wired connection.
  • the server in this embodiment is also separately connected to multiple user intelligent terminal devices, for example, the user intelligent terminal device passes WIFI or GSM (Global System for Mobile Communication) or CDMA (Code Division Multiple Access, Code Division Multiple) Access) or WCDMA (Wideband Code Division Multiple Access, Wideband Mobile communication technologies such as Code Division Multiple Access are connected to servers.
  • WIFI Global System for Mobile Communication
  • CDMA Code Division Multiple Access, Code Division Multiple
  • WCDMA Wideband Code Division Multiple Access, Wideband Mobile communication technologies such as Code Division Multiple Access are connected to servers.
  • the server in this embodiment may be a server installed in the cloud, that is, a cloud server.
  • the webcam in this embodiment may be specifically a webcam integrated with a voice recognition function and an audio play function.
  • the user smart terminal device may be an intelligent electronic device such as a smart mobile phone or a desktop computer or a notebook computer or a tablet computer that can exchange information with the server through the mobile communication technology.
  • S100 and the webcam collect audio data of the environment in which they are located in a video surveillance state, and perform speech recognition on the audio data collected by the network camera.
  • the webcam of the embodiment can work in a plurality of different working states and switch its working state under the trigger of a certain operation, that is, the webcam can automatically work from one of them according to actual conditions. The state switches to another working state.
  • the working state of the webcam in this embodiment mainly includes: a video monitoring state, a call state, and a media data playing state; under normal circumstances, the video monitoring state is a normal working state of the webcam, that is, the webcam captures the video of the environment in which it is located.
  • the call state is the interaction of media data (such as audio data or video data) between the web camera and the user intelligent terminal device, That is to say, the network camera and the user intelligent terminal device are connected through the server, so that the user at the location of the webcam and the user at the location of the user intelligent terminal device can realize the IP through the webcam and the user intelligent terminal device (network interaction The protocol (Internet Protocol) calls (ie, network calls);
  • the media data playback state is the transmission of media data (such as audio data or video data) between the webcam and the server, that is, the webcam receives the transmission from the server.
  • Media data Audio data or video data and play the media data.
  • the webcam in this embodiment will normally be in a video surveillance state.
  • the IP call can be specifically an IP voice call or an IP video call.
  • the IP call can be a multimedia call in an existing social application.
  • the IP call can be a video call or a WeChat chat tool in the QQ chat tool. Video chat and so on.
  • the network camera in this embodiment can perform audio data collection operations according to preset parameters (such as acquisition frequency, etc.) whether it is in the video monitoring state, whether it is in the call state, or in the media data playing state; However, under normal circumstances, the webcam will perform speech recognition processing on the audio data it collects only when it is in the video surveillance state; however, in actual applications, the webcam is in a call state or media data playback. It is also feasible to perform speech recognition processing on the audio data collected by the state.
  • preset parameters such as acquisition frequency, etc.
  • the webcam in this embodiment has a simple speech recognition processing capability, such as a webcam that can convert the audio data it collects into text and the like.
  • the webcam can use the existing speech recognition technology
  • the collected audio data is subjected to speech recognition processing.
  • the specific implementation process of the voice recognition processing performed by the webcam is not described in detail in this embodiment.
  • the webcam extracts keywords from the speech recognition result.
  • the webcam can remove the uncharacterized words or words such as the modality auxiliary words and the conjunctions in the speech recognition result, thereby obtaining one or more keywords.
  • the webcam converts the audio data collected by the webcam into text text
  • the webcam can extract keywords from the text text recognized by the voice in various ways.
  • the webcam can use the text keyword extraction algorithm to obtain the key. word.
  • the specific implementation process of the keyword extraction by the webcam is not described in detail in this embodiment.
  • the webcam sends a processing request carrying the identification information of the webcam and the basic data to the designated server (ie, the server), where the basic data includes: keywords At least one of audio data and video data.
  • the predetermined keyword may be a keyword stored locally in the webcam, or may be a keyword stored in another device.
  • the following describes a keyword stored in a webcam as a predetermined keyword as an example.
  • the webcam is preset with one or more keywords, and the preset keywords form a keyword set; the user can access the server connected to the webcam through the user smart terminal device, and set the keywords in the webcam by using the server.
  • Some or all of the keywords included in the collection; in addition, some or all of the keywords included in the keyword set may also be set in the webcam when the webcam is shipped from the factory.
  • the webcam can compare the extracted keywords with the keywords in the keyword set to generate corresponding processing requests according to the comparison result, for example, the webcam matches the keywords extracted by the keywords with the keywords in the keyword set.
  • the webcam may generate a corresponding processing request if any of the extracted keywords matches a keyword in the stored keyword set, and send the processing request to the server.
  • the processing request generated by the webcam should carry the identification information of its webcam to indicate which webcam is sent to the server.
  • the processing request may further carry a keyword extracted by the webcam to indicate that the webcam wants the server to perform a corresponding operation according to the keyword carried in the processing request; for example, a keyword carried in the processing request sent by the webcam For "call” and “dad”, it means that the webcam wants the server to perform the operation of calling the corresponding user intelligent terminal device; for example, the keywords carried in the processing request sent by the webcam are "Baidu”, “Black Tea” and " “Variety” means that the webcam wants the server to perform the operation of querying the black tea variety.
  • the webcam can collect any keyword extracted by it to match the keyword in the stored keyword set.
  • the corresponding audio data corresponding to the above keywords is carried in the processing request, so that the server can perform more intelligent speech recognition and analysis on the audio data.
  • the processing request sent by the network camera hair to the server may carry the identification information of the webcam, may also carry the identification information and keywords of the webcam, and may also carry the identification information of the webcam and the webcam to collect.
  • Audio data may also carry identification information of the webcam, keywords, and audio data collected by the webcam; the webcam may carry the audio data collected by the webcam in each processing request sent to the server. The audio data may also be carried in the processing request when needed.
  • the webcam when the webcam performs an operation on the server requested by the user according to the voice recognition result, the webcam carries the audio data collected by the webpage in the processing request, and if the network When the camera is very clear about the operation performed by the server requested by the user according to the result of the voice recognition, the webcam may not carry the audio data collected by the webcam in the processing request.
  • the processing request sent by the webcam to the server may carry the captured video data, which facilitates the server to further analyze the needs of the user at the webcam.
  • the audio data and the video data carried in the processing request of this embodiment are both audio data and video data including a time period corresponding to the dangerous image.
  • the processing request in this embodiment may be a message based on HTTP (Hypertext Transfer Protocol), or may be a message based on other protocols.
  • the identification information of the webcam in this embodiment may be the webcam physical device encoding information, the mobile phone number of the user's smart mobile phone, or the user account of the social application, such as the user account of the QQ chat tool. Or the user account of the WeChat chat tool.
  • the webcam in this embodiment is a webcam with simple language analysis capability, and the webcam can perform corresponding operations using the simple language analysis capability; that is, the webcam can recognize its collection.
  • the predetermined audio data is included in the audio data
  • the network camera can generate a corresponding processing request and analyze the generated audio data to include a predetermined keyword, and send the corresponding processing request to the server connected thereto. Processing request.
  • the server generates a processing response according to the basic data in the received processing request, and performs information interaction of the webcam corresponding to the corresponding user intelligent terminal device and/or the identification information of the webcam carried in the processing request according to the processing response. .
  • the information interaction performed by the server according to the processing request received by the server may be specifically: the operation of connecting the dialog operation, notifying the user operation, querying and returning the query result, or returning the invalid information operation, etc., correspondingly, the above processing response It may be a processing response to the call, may be a processing response to the notification, may be a processing response to the query, or may be a processing response to the invalid information.
  • the dialogue operation is an IP conversation between the intelligent terminal device of the Unicom user and the webcam; the user is notified to send the corresponding prompt information to the user intelligent terminal device; the operation of querying and returning the query result is to obtain the webcam.
  • the content of the required query is returned to the webcam; the invalid information operation is returned, that is, the server returns to the webcam information indicating that the audio data collected by the webcam is meaningless.
  • the server performs according to the preset default operation information.
  • Corresponding operations for example, when the server receives the processing request, obtain the identification information of the webcam from the processing request, and use the identification information of the webcam (such as the user account information of the webcam) to search for the user from the stored information.
  • User account information of the smart terminal device, and connecting the IP call between the webcam and the user smart terminal device according to the user account information of the webcam and the user account information of the user smart terminal device, and the server is connected between the two When the IP call is in progress, the webcam is in a call state.
  • the webcam can transmit the audio data and/or video data that it currently collects in real time to the server in real time, and transmit it to the user smart terminal device by the server, and the webcam receives the user intelligence.
  • the audio data transmitted by the terminal device is transmitted by the terminal device, the audio data should be played in time; in the case that the network camera has a display screen, the network camera can also play the video data transmitted by the user intelligent terminal device via the server; After the IP call between the user intelligent terminal device and the webcam ends, the webcam switches to the video surveillance state, continues to collect video data and audio data, and performs speech recognition processing on the audio data collected.
  • the server When the processing request received by the server carries the identification information of the webcam and the keyword extracted by the webcam, and does not carry the audio data collected by the webcam, the server performs the keyword according to the keyword carried in the processing request. Corresponding operations, for example, when the server receives the processing request, obtains the identification information and keywords of the webcam from the processing request, and when the keyword includes “call” and “dad”, the server uses the identification information of the webcam.
  • the IP camera between the webcam and the user's smart terminal device, and the IP camera is in a call state when the IP call of the two is connected; after the IP call ends, the webcam switches to the video surveillance state, and the webcam continues to collect the video.
  • video data Performing voice recognition processing on the collected audio data; for example, when receiving the processing request, the server acquires the identification information and keywords of the webcam from the processing request, and the obtained keywords include “Baidu” and “Black Tea”.
  • the server uses the search engine to find the query result corresponding to the "black tea variety”. Under normal circumstances, the server will obtain multiple query results, and the server may select one query result from multiple query results, such as The server selects the introduction of "black tea variety” in Baidu Encyclopedia; the server converts the specific content of the "black tea variety” found into the corresponding format data (such as audio data or video data, etc.), and returns the response to the webcam through the query response; In When the webcam receives the query response returned by the server, the webcam switches to the media data playing state. After the webcam plays the query result (such as audio data and/or video data) carried in the query response, it automatically switches to the video monitoring state, continues to collect video and audio data, and performs speech recognition processing on the audio data collected.
  • the server selects the introduction of "black tea variety” in Baidu Encyclopedia
  • the server converts the specific content of the "black tea variety” found into the corresponding format data (such as audio data or video data, etc.), and returns the response to the webcam through the query
  • the server When the processing request received by the server carries the identification information of the webcam and the audio data collected by the webcam, and does not carry the keyword extracted by the webcam, the server performs voice recognition processing on the audio data carried in the processing request. And performing corresponding operations according to the result of the voice recognition processing of the network; the server in this embodiment generally has a smarter and more complex voice recognition technology than the voice recognition technology possessed by the web camera;
  • the server when receiving the processing request, acquires audio data from the processing request, and performs voice recognition processing on the audio data.
  • the server corresponds to the identification information of the network camera.
  • the webcam returns a processing response carrying information indicating invalid audio data; if the server determines that the audio data is a calling user smart terminal device (such as calling 135********), it may be stored according to the The information determines the user account of the user intelligent terminal device corresponding to the 135********, and calls the user intelligent terminal device according to the user account, and after the server connects the user intelligent terminal device, the server determines the network according to the identification information of the network camera.
  • the user account of the camera and according to the user account of the webcam, the IP call between the user smart terminal device and the webcam is connected, and when the server connects the IP call between the two, the webcam is in a call state.
  • the webcam switches to the video surveillance state, continues to collect video and audio data, and performs speech recognition processing on the audio data collected.
  • the server when receiving the processing request, acquires audio data from the processing request, and performs voice recognition processing on the audio data, and when determining that the audio data has no practical meaning, the server identifies the information to the webcam.
  • the corresponding webcam returns a processing response carrying information indicating invalid audio data; if the server determines that the audio data is for the user to query the corresponding content (such as how to query from the ** to the Beijing Railway Station), the search engine can be utilized.
  • the search query operation is performed according to the search keyword identified by the server, and after obtaining the query result, the server converts the query result into data of a corresponding format (such as audio data or video data, etc.), and carries the data corresponding to the query result in the query.
  • the server returns the query response to the webcam corresponding to the identification information of the webcam, and the webcam is in the media data playing state after receiving the query response transmitted by the server and carrying the query result, and displaying the query response to the user.
  • Query results such as playing The audio data query response sent to the traffic carried.
  • the webcam automatically switches to the video monitoring state, continues to collect video and audio data, and performs speech recognition processing on the audio data collected.
  • the server when receiving the processing request, acquires audio data and video data from the processing request, and performs voice recognition processing on the audio data.
  • the server sends the network camera to the webcam.
  • the webcam corresponding to the identification information returns a processing response carrying information indicating invalid audio data; if the server determines that the audio data is a calling user smart terminal device (such as calling dad), the image data of the acquired video data may be performed.
  • the server is connected to the user intelligent terminal
  • the user account of the webcam is determined according to the identification information of the webcam
  • the IP call between the user smart terminal device and the webcam is connected according to the user account of the webcam, and when the server connects the IP call between the two.
  • the webcam is at The session state. After the IP call between the user intelligent terminal device and the webcam ends, the webcam switches to the video surveillance state, continues to collect video and audio data, and performs speech recognition processing on the audio data collected.
  • the server needs to perform voice recognition processing on the audio data carried in the processing request, and the server
  • the corresponding operation can be performed only according to the voice recognition processing result of the user; the server can also perform the corresponding operation according to the voice recognition processing result of the voice and the keyword carried in the processing request; in the actual application, the server can be pre-in accordance with the internal
  • the corresponding logic is set to decide whether to perform the corresponding operation by referring to the keyword carried in the processing request transmitted by the webcam.
  • the server may perform image recognition processing on the video data carried in the processing request, and the server should decide whether to perform the corresponding operation according to the image recognition result according to the corresponding logic.
  • the logic here can be set according to actual conditions, and will not be described in detail in this embodiment.
  • Embodiment 2 Intelligent camera system. The specific devices included in the smart camera system of the present embodiment and the specific structures of the devices will be described in detail below with reference to FIG.
  • the smart camera system shown in FIG. 2 mainly includes: a webcam 200 and a server 210 connected to the webcam 200; although only one webcam 200 is schematically connected to the server 210 in FIG. 2, in practical applications, A server 210 is typically connected to a plurality of web cameras 200.
  • the webcam 200 can be connected to the server 210 via WIFI.
  • the webcam 200 can also be connected to the server 210 by means of a wired connection.
  • the server 210 in this embodiment is also separately connected to a plurality of user intelligent terminal devices 220 (only one user smart terminal device 220 is schematically illustrated in FIG. 2), for example, the user smart terminal device 220 passes WIFI or GSM or CDMA. Or a mobile communication technology such as WCDMA is connected to the server 210.
  • the server 210 in this embodiment may be a server installed in the cloud, that is, the server 210 is a cloud service. Server.
  • the webcam 200 in this embodiment may be specifically a webcam integrated with a voice recognition function and an audio playback function.
  • the user smart terminal device 220 may be a smart electronic device such as a smart mobile phone or a desktop computer or a notebook computer or a tablet computer that can exchange information with the server through a mobile communication technology.
  • the webcam 200 in this embodiment mainly includes an acquisition module 201, a first speech recognition module 202, an extraction module 203, a request module 204, and an interaction processing module 205.
  • the server 210 in this embodiment mainly includes: a processing module 211; and the processing module 211 mainly includes: a second voice recognition module 212, a call module 213, a query module 214, and an invalid response module 215.
  • the acquisition module 201 is mainly adapted to collect audio data of the environment where the webcam 200 is located when the webcam 200 is in a video surveillance state.
  • the webcam 200 can work in a plurality of different working states and switch its working state under the trigger of a certain operation, that is, the webcam 200 can automatically switch from one working state according to actual conditions. Go to another working state.
  • the working state of the webcam 200 in this embodiment mainly includes: a video monitoring state, a call state, and a media data playing state; under normal circumstances, the video monitoring state is a normal working state of the webcam 200, that is, the webcam 200 collects its location.
  • the video data of the environment and the video data collected by the camera to realize the video monitoring function of the current camera; the call state is the media data (such as audio data or video data) between the webcam 200 and the user smart terminal device 220.
  • the interaction that is, the webcam 200 and the user smart terminal device 220 are connected through the server 210, such that the user at the location of the webcam 200 and the user at the location of the user smart terminal device 220 can pass through the webcam 200 and
  • the user smart terminal device 220 implements an IP call (ie, a network call);
  • the media data play state is the transmission of media data (such as audio data or video data) between the webcam 200 and the server 210, that is, the web camera 200 receives the server 210.
  • Media data (such as audio data) Of the video data), and play the media data.
  • the webcam 200 in this embodiment will normally be in a video surveillance state.
  • the IP call can be specifically an IP voice call or an IP video call.
  • the IP call can be a multimedia call in an existing social application.
  • the IP call can be a video call or a WeChat chat tool in the QQ chat tool. Video chat and so on.
  • the first speech recognition module 202 is mainly adapted to perform speech recognition on the audio data collected by the acquisition module 201.
  • the network camera 200 in this embodiment is in a video monitoring state, is in a call state, or is in a media data playing state, and the collecting module 201 can perform audio data collection according to a preset acquisition frequency.
  • the first speech recognition module 202 only has The voice recognition process is performed on the audio data collected by the acquisition module 210 when the webcam 200 is in the video surveillance state; however, in the actual application, the first voice recognition module 202 is in the call state or the media data is played on the webcam 200. It is also completely feasible to perform speech recognition processing on the audio data collected by the acquisition module 201 in the state.
  • the webcam 200 in this embodiment has a simple speech recognition processing capability.
  • the first speech recognition module 202 can convert the audio data collected by the acquisition module 201 into text text or the like.
  • the first speech recognition module 202 can perform speech recognition processing on the audio data collected by the acquisition module 201 by using an existing speech recognition technology.
  • the specific implementation process of the first speech recognition module 202 for performing speech recognition processing is not described in detail in this embodiment.
  • the extraction module 203 is primarily adapted to extract keywords from the speech recognition results of the first speech recognition module 202.
  • the extraction module 203 may remove the words or words that are not important in the speech recognition result of the first speech recognition module 202 and the conjunctions, thereby obtaining one or more keywords.
  • the extraction module 203 may extract keywords from the text characters recognized by the speech in various manners.
  • the extraction module 203 may A text keyword extraction algorithm is used to obtain keywords. The specific implementation process of the extraction module 203 for keyword extraction is not described in detail in this embodiment.
  • the requesting module 204 is mainly adapted to, when the keyword extracted by the extraction module 203 belongs to a predetermined keyword, send a processing request carrying the identification information of the web camera and the basic data to the server 210 connected to the webcam 200, where the basic data is Including: at least one of keywords, audio data, and video data.
  • the predetermined keyword may be a keyword stored locally in the webcam, or may be a keyword stored in another device.
  • the following describes a keyword stored in a webcam as a predetermined keyword as an example.
  • the webcam 200 is preset with one or more keywords, and the preset keywords form a keyword set; the user can access the server 210 connected to the webcam 200 through the user smart terminal device 220, and use the server 210 to
  • the keywords included in the keyword set in the webcam 200 are set; in addition, some or all of the keywords included in the keyword set may be set in the webcam 200 when the webcam 200 is shipped.
  • the requesting module 204 may compare the extracted keywords with the keywords in the keyword set to generate a corresponding processing request according to the comparison result, for example, the requesting module 204 extracts the keywords extracted from the extraction module 203 and the keyword set. If the keyword is matched, the requesting module 204 may generate a corresponding processing request if any one of the keywords extracted by the extraction module 203 matches one of the keyword sets stored by the webcam 200, and the processing request is generated. Send to server 210.
  • the processing request generated by the webcam 200 should carry the identification information of its webcam to indicate which webcam 200 the processing request is sent to the server 210.
  • the processing request may also carry a keyword extracted by the webcam to indicate that the requesting module 204 wants the server to perform a corresponding operation according to the keyword carried in the processing request; for example, the processing request sent by the requesting module 204 is carried in the request.
  • the keywords "call” and “dad” indicate that the requesting module 204 wants the server 210 to perform a call operation of the corresponding user smart terminal device 220; for example, the keyword carried in the processing request sent by the requesting module 204 is "Baidu".
  • ", "Black Tea” and “Variety” means that the request module 204 wants the server 210 to perform a query operation for querying the black tea variety.
  • the requesting module 204 may collect the collection module 201 if any keyword extracted by the extraction module 203 matches the keyword in the stored keyword set.
  • the corresponding audio data corresponding to the above keyword is carried in the processing request, so that the server 210 can perform more intelligent speech recognition and analysis on the audio data.
  • the processing request sent by the requesting module 204 to the server 210 may carry the identification information of the webcam, may also carry the identification information and keywords of the webcam, and may also carry the identification information and network of the webcam.
  • the audio data collected by the camera may also carry the identification information of the webcam, the keyword, and the audio data collected by the webcam; the requesting module 204 may carry the collected information in each processing request sent by the server to the server.
  • the audio data may also carry audio data in the processing request when needed. If the request module 204 is ambiguous about the operation performed by the server 210 requested by the user according to the voice recognition result, the requesting module 204 carries the collected audio in the processing request.
  • the processing request sent by the webcam to the server may carry the captured video data, which facilitates the server to further analyze the needs of the user at the webcam.
  • the audio data and the video data in this embodiment are both audio data and video data including a time period corresponding to the keyword.
  • the processing request in this embodiment may be an HTTP-based message, or may be a message based on other protocols.
  • the identification information of the webcam in this embodiment may be the webcam physical device encoding information, the mobile phone number of the user's smart mobile phone, or the user account of the social application, such as the user account of the QQ chat tool. Or the user account of the WeChat chat tool.
  • the webcam 200 in this embodiment is a webcam with simple language analysis capability, and the webcam 200 can perform corresponding operations using the simple language analysis capability; that is, the webcam 200 can recognize Whether the predetermined audio data is included in the collected audio data, and if the network camera 200 analyzes that the collected audio data includes a predetermined keyword, the corresponding processing request may be generated and connected thereto.
  • Server 210 sends the processing request it generated.
  • the processing module 211 is mainly adapted to generate a corresponding processing response according to the basic data in the processing request received by the server 210, and execute a webcam corresponding to the identification information of the corresponding user intelligent terminal device 220 and/or the webcam based on the processing response. 200 information interaction.
  • the information interaction operation performed by the processing module 211 according to the processing request received by the server 210 may be specifically: an operation of closing a dialog operation, notifying a user operation, querying and returning a query result, or returning an invalid information operation, etc., correspondingly,
  • the above processing response may be a processing response to the call, may be a processing response to the notification, may be a processing response to the query, or may be a processing response to the invalid information.
  • the session operation is connected to the IP session between the user intelligent terminal device 220 and the webcam 200; the user is notified to send the corresponding prompt information to the user smart terminal device 220; the operation of querying and returning the query result is obtained by the webcam 200.
  • the content to be queried is returned to the webcam 200; the invalid information operation is returned, that is, the server 210 returns information indicating that the audio data collected by the webcam 200 is meaningless to the webcam 200.
  • the second speech recognition module 212 is mainly adapted to acquire audio data from the processing request received by the server 210 and perform speech recognition on the audio data obtained therefrom.
  • the calling module 213 is mainly configured to determine, according to the information stored in the server 210, the user account of the user smart terminal device 220, in the case that the voice recognition result of the second voice recognition module 212 is determined to be the calling user smart terminal device 220, and according to the The user account calls the user smart terminal device 220.
  • the user account of the webcam 200 is determined according to the identification information of the webcam, and the user smart terminal device 220 is connected according to the user account of the webcam 200.
  • the IP call between the web cameras 200 causes the webcam 200 to be in a call state.
  • the query module 214 is mainly configured to: when the second voice recognition module 212 determines that the voice recognition result is an information query, obtain the query result according to the query keyword, and return the carried result to the webcam 200 corresponding to the identifier information of the webcam. The query response of the audio data.
  • the interaction processing module 205 is mainly adapted to the audio data carried in the query response sent by the playback server 210 when the webcam 200 is in the media data playing state.
  • the invalid response module 215 is mainly adapted to return the information carrying the information indicating the invalid audio data to the webcam 200 corresponding to the identification information of the webcam in the case that the second speech recognition module 212 determines that the speech recognition result is meaningless to the audio data. Process the response.
  • the processing module 211 performs the corresponding operation according to the preset default operation information.
  • the calling module 213 acquires the identification information of the webcam from the processing request, and uses the identification information of the webcam. (such as the use of webcam 200
  • the user account information is used to search for the user account information of the user smart terminal device 220 from the information stored in the server 210, and connect the web camera and the user smart terminal device according to the user account information of the web camera and the user account information of the user smart terminal device.
  • the webcam 200 is in a call state.
  • the interaction processing module 205 can transmit the audio data and/or video data currently collected by the acquisition module 201 in real time to the server 210, and transmit it to the user smart terminal device 220 by the server 210.
  • the interaction processing module 205 should play the audio data in time; in the case that the webcam 200 has a display screen, the interaction processing module 205 can also Playing the video data transmitted by the user smart terminal device via the server; after the IP call between the user smart terminal device 220 and the web camera 200 ends, the webcam 200 switches to the video monitoring state, and the webcam 200 continues to collect the video data and
  • the first speech recognition module 201 performs speech recognition processing on the audio data collected by the acquisition module 201.
  • the processing request received by the server 210 carries the identification information of the webcam and the keyword extracted by the extraction module 203, and does not carry the audio data collected by the webcam
  • the corresponding module in the processing module 211 is processed according to the processing.
  • the keyword carried in the request is performed to perform a corresponding operation.
  • the server 210 receives the processing request
  • the calling module 213 and the query module 214 obtain the identification information and keywords of the webcam from the processing request, and include in the keyword.
  • the calling module 213 uses the identification information of the webcam (such as the user account information of the webcam) to search for the user account information of the user smart terminal device corresponding to the dad from the information stored in the server 210, and
  • the IP camera between the webcam 200 and the user smart terminal device 220 is turned on according to the user account information of the webcam and the user account information of the found user smart terminal device 220, and the webcam is connected when the IP call of the two is connected.
  • the webcam 200 continues to collect video data and video data, and the first speech recognition module 202 collects the audio data collected by the module 201 for speech recognition processing; for example, when the server 210 receives the processing request.
  • the calling module 213 and the query module 214 respectively obtain the identification information and keywords of the webcam from the processing request, and in the case that the acquired keywords include “***”, “black tea” and “variety”, the query module 214 Using the search engine to search for the query result corresponding to the "black tea variety”, in the case that the query module 214 obtains a plurality of query results, the query module 214 can select one query result from the plurality of query results, for example, the query module 214 selects the pair of Baidu encyclopedias.
  • the query module 214 converts the specific content of the "black tea variety” found into the corresponding format data (such as audio data or video data, etc.), and returns it to the webcam 200 through the query response; in the webcam 200 when receiving the query response returned by the server 210, the network takes Head 200 is switched to the state of playing the media data.
  • the webcam plays the query result (such as audio data and/or video data) carried in the query response by the interaction processing module 205,
  • the video data is automatically switched to the video surveillance state, and the video data and the audio data are continuously collected.
  • the first voice recognition module 202 performs voice recognition processing on the audio data collected by the collection module 201.
  • the second voice recognition module 212 carries the audio carried in the request in the case that the processing request received by the server carries the identification information of the webcam and the audio data collected by the webcam without carrying the keyword extracted by the webcam.
  • the data is subjected to speech recognition processing, and the calling module 213, the query module 214 or the invalid response module 215 performs a corresponding operation according to the speech recognition processing result of the second speech recognition module 212; the second speech recognition module 212 in this embodiment generally has a ratio
  • the speech recognition technology possessed by the first speech recognition module 202 is a smarter and more complex speech recognition technology;
  • the second voice recognition module 212 acquires audio data from the processing request, and performs voice recognition processing on the audio data, and invalid response when determining that the audio data has no practical meaning.
  • the module 215 returns a processing response carrying the information indicating the invalid audio data to the webcam corresponding to the identification information of the webcam; in the case of determining that the audio data is the calling user smart terminal device (such as calling 135*******) *), the calling module 213 can determine the user account of the user intelligent terminal device corresponding to the 135******** according to the information stored by the server 210, and call the user intelligent terminal device according to the user account, and the calling module 213 is connected.
  • the user account of the webcam is determined according to the identification information of the webcam, and the IP call between the user smart terminal device and the webcam is connected according to the user account of the webcam, and the call module 213 connects the two.
  • the webcam 200 is in a call state. After the IP call between the user smart terminal device 220 and the webcam 200 ends, the webcam 200 switches to the video surveillance state to continue collecting video data and audio data, and the first voice recognition module 202 performs the audio data collected by the acquisition module 201. Speech recognition processing.
  • the second voice recognition module 212 acquires audio data from the processing request, and performs voice recognition processing on the audio data, and is invalid when determining that the audio data has no practical meaning.
  • the response module 215 returns a processing response carrying information indicating invalid audio data to the webcam 200 corresponding to the identification information of the webcam; and determining that the audio data is for the user to query the corresponding content (eg, how to query from ** to Beijing)
  • the query module 214 can use the search engine to perform a search query operation according to the search keyword identified by the second voice recognition module 212.
  • the query module 214 converts the query result into data of a corresponding format ( For example, audio data or video data, and the data corresponding to the query result is carried in the query response, and the query module 214 returns the query response to the webcam corresponding to the identifier information of the webcam, and the webcam 200 transmits the server 210.
  • the query response that carries the query results The media play status data, the interactive display processing module 205 in response to the query result to a user query, such as the processing module 205 displays the interactive server 210 transmits the audio data carried in the query response.
  • the webcam is displayed to the user in the interactive processing module 205 After querying the corresponding query result (such as after playing the audio data), automatically switching to the video monitoring state, continuing to collect video data and audio data, and the first voice recognition module 202 performs voice recognition processing on the audio data collected by the collecting module 201. .
  • the second voice recognition module 212 acquires audio data from the processing request, and performs voice recognition processing on the audio data, and is invalid when determining that the audio data has no practical meaning.
  • the response module 215 returns a processing response carrying information indicating invalid audio data to the webcam 200 corresponding to the identification information of the webcam; and the server 210 determines that the audio data is the calling user smart terminal device (eg, calling dad), the server
  • the image recognition module in 210 may perform image recognition on the video data carried in the processing request to determine the user referred to by the dad, and then the calling module 213 determines, according to the information stored by the server 210, the user intelligent terminal device corresponding to the user that is referred to.
  • the calling module 213 determines the user account of the webcam according to the identification information of the webcam, and connects the user intelligent terminal according to the user account of the webcam.
  • device IP call between a webcam, and a call module 213 is turned on when the IP call between the two, the network camera 200 in a call.
  • the webcam 200 switches to the video surveillance state to continue collecting video and audio data, and the first speech recognition module 202 performs speech recognition on the audio data collected by the acquisition module 201. deal with.
  • the second speech recognition module 212 needs to process the audio data carried in the request.
  • the voice recognition processing is performed, and the calling module 213, the query module 214, and the invalid response module 215 can perform corresponding operations only according to the voice recognition processing result of the second voice recognition module 212; the calling module 213, the query module 214, and the invalid response module 215 are also The corresponding operation may be performed according to the voice recognition processing result of the second voice recognition module 212 and referring to the keyword carried in the processing request; in an actual application, the calling module 213, the query module 214, and the invalid response module 215 may be configured according to presets.
  • the logic determines whether to perform the corresponding operation with reference to the keywords carried in the processing request transmitted by the webcam.
  • the image recognition module in the server may perform image recognition processing on the video data carried in the processing request, and the calling module 213, the query module 214, and the invalid response module 215 shall be configured according to corresponding
  • the logic determines whether to refer to the image recognition result to perform the corresponding operation.
  • the logic here can be set according to actual conditions, and will not be described in detail in this embodiment.
  • modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment.
  • the modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components.
  • any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined.
  • Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
  • the various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some or all of some or all of the components of the smart camera system and/or webcam in accordance with embodiments of the present invention.
  • the invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • Figure 3 illustrates a computing device that can implement an implementation of the smart camera system in accordance with the present invention.
  • the computing device conventionally includes a processor 310 and a computer program product or computer readable medium in the form of a memory 320.
  • the memory 320 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM.
  • the memory 320 has a memory space 330 for program code 331 for performing any of the method steps described above.
  • storage space 330 for program code may include various program code 331 for implementing various steps in the above methods, respectively.
  • the program code can be read from or written to one or more computer program products to the one or more computer programs.
  • Such computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • Such a computer program product is typically a portable or fixed storage unit as described with reference to FIG.
  • the storage unit may have storage segments, storage spaces, and the like that are similarly arranged to memory 320 in the computing device of FIG.
  • the program code can be compressed, for example, in an appropriate form.
  • the storage unit includes computer readable code 331', ie, code readable by a processor, such as 310, that when executed by a computing device causes the computing device to perform each of the methods described above step.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本发明公开了一种智能摄像***的实现方法、智能摄像***和网络摄像头;其中的智能摄像***主要包括:服务器以及网络摄像头;且其中的方法包括:网络摄像头在视频监控状态下采集其所在环境的音频数据,并对其采集的音频数据进行语音识别;网络摄像头从语音识别结果中提取关键词;在所述提取的关键词属于预定关键词的情况下,网络摄像头向指定服务器发送携带有所述网络摄像头的标识信息以及基础数据的处理请求,所述基础数据包括:所述关键词、音频数据以及视频数据中的至少一种;服务器根据接收到的处理请求中的基础数据产生处理响应,并基于该处理响应执行与相应的用户智能终端设备和/或所述网络摄像头的标识信息对应的网络摄像头的信息交互。

Description

智能摄像***的实现方法、智能摄像***和网络摄像头 技术领域
本发明涉及视频监控技术,具体涉及一种智能摄像***的实现方法、智能摄像***以及网络摄像头。
背景技术
现有的一些摄像头可以通过网络与服务器连接,且一个服务器可以连接大量的摄像头,用户可以通过服务器调取并查看摄像头摄取的画面。这样的摄像头可以称为网络摄像头。
现有的网络摄像头与用户或服务器之间的交互通常包括如下两种:
一、网络摄像头通过其指示灯或者蜂鸣器等元器件告知用户网络摄像头的当前状态。网络摄像头的状态通常包括:在线、离线、启动、被查看、异常报警以及死机等;例如,对于Dropcam摄像头而言,蓝灯长亮表示摄像头当前在线,绿灯长亮表示摄像头当前离线,蓝灯闪烁表示有人正在通过服务器查看摄像头,红灯闪烁表示摄像头自身出现异常而报警,红灯长亮表示摄像头当前处于死机状态;再例如,蜂鸣声1表示摄像头当前在线,蜂鸣声2表示摄像头当前离线,蜂鸣声3表示摄像头当前正在启动。
二、网络摄像头通知服务器其已上线以及网络摄像头应服务器的请求向服务器传输其摄取的视频画面。
发明人在实现本发明过程中发现,网络摄像头依赖于指示灯的颜色、闪烁速度、亮度、蜂鸣器合成音种类、蜂鸣速度以及蜂鸣声音大小等所表达的信息非常有限;且网络摄像头向服务器提供的信息较单一;由此可知,现有的摄像***的智能化程度有待于进一步提高。
发明内容
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的智能摄像***的实现方法、智能摄像***以及网络摄像头。
依据本发明的一个方面,提供了一种智能摄像***的实现方法,该方法包括:网络摄像头在视频监控状态下采集其所在环境的音频数据,并对其采集的音频数据进行语音识别;网络摄像头从语音识别结果中提取关键词;在所述提取的关键词属 于预定关键词的情况下,网络摄像头向指定服务器发送携带有所述网络摄像头的标识信息以及基础数据的处理请求,所述基础数据包括:所述关键词、音频数据以及视频数据中的至少一种;指定服务器根据接收到的处理请求中的基础数据产生处理响应,并基于该处理响应执行与相应的用户智能终端设备和/或所述网络摄像头的标识信息对应的网络摄像头的信息交互。
依据本发明的再一个方面,提供了一种智能摄像***,该***包括:采集模块,设置于网络摄像头中,适于在网络摄像头处于视频监控状态下,采集网络摄像头所在环境的音频数据;第一语音识别模块,设置于网络摄像头中,适于对采集模块采集的音频数据进行语音识别;提取模块,设置于网络摄像头中,适于从语音识别结果中提取关键词;请求模块,设置于网络摄像头中,适于在所述提取的关键词属于预定关键词的情况下,向指定服务器发送携带有所述网络摄像头的标识信息以及基础数据的处理请求,所述基础数据包括:所述关键词、音频数据以及视频数据中的至少一种;处理模块,设置于指定服务器中,适于根据指定服务器接收到的处理请求中的基础数据产生处理响应,并基于该处理响应执行与相应的用户智能终端设备和/或所述网络摄像头的标识信息对应的网络摄像头的信息交互。
依据本发明的再一个方面,提供了一种网络摄像头,该网络摄像头主要包括:采集模块,适于在网络摄像头处于视频监控状态下采集网络摄像头所在环境的音频数据;第一语音识别模块,适于对采集模块采集的音频数据进行语音识别;提取模块,适于从语音识别结果中提取关键词;请求模块,适于在所述提取的关键词属于预定关键词的情况下,向指定服务器发送携带有所述网络摄像头的标识信息以及基础数据的处理请求,以使指定服务器根据接收到的处理请求中的基础数据产生处理响应,并基于该处理响应执行与相应的用户智能终端设备和/或所述网络摄像头的标识信息对应的网络摄像头的信息交互,所述基础数据包括:所述关键词、音频数据以及视频数据中的至少一种。
根据本发明的又一个方面,提供了一种计算机程序,其包括计算机可读代码,当所述计算机可读代码在计算设备上运行时,导致所述计算设备执行任一个上述的智能摄像***的实现方法。
根据本发明的再一个方面,提供了一种计算机可读介质,其中存储了上述的计算机程序。
本发明的智能摄像***的实现方法、智能摄像***和网络摄像头通过网络摄像头采集音频数据以及对采集的音频数据进行语音识别,并基于语音识别结果向服务器发送相应的处理请求,使服务器可以根据该处理请求中的基础数据与用户智能终 端设备以及网络摄像头进行相应的信息交互,例如,服务器联通用户智能终端设备与网络摄像头,使用户智能终端设备和网络摄像头之间可以实现网络通话,再例如,服务器向网络摄像头返回用户需要查询的信息的音频数据,由网络摄像头播放该音频数据等;从而本发明实施例提高了网络摄像头的信息交互能力,进而提高了智能摄像***的智能化程度。
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了根据本发明实施例一的智能摄像***的实现方法流程图;
图2示出了根据本发明实施例二的包含有网络摄像头具体结构的智能摄像***示意图;
图3示意性地示出了用于执行根据本发明的智能摄像***的实现方法的计算设备的框图;以及
图4示意性地示出了用于保持或者携带实现根据本发明的智能摄像***的实现方法的程序代码的存储单元。
具体实施方式
下面结合附图和具体的实施方式对本发明作进一步的描述。
实施例一、智能摄像***的实现方法。
本实施例中的智能摄像***主要包括:服务器以及网络摄像头,且一个服务器与一个或者多个网络摄像头分别连接;例如,网络摄像头通过无线连接方式(例如WIFI,无线保真,一种可以将个人电脑、手持设备等终端以无线方式互相连接的技术)与服务器连接,当然,网络摄像头也可以通过有线连接方式与服务器连接。本实施例中的服务器还与多个用户智能终端设备分别连接,例如,用户智能终端设备通过WIFI或者GSM(全球移动通信***,Global System for Mobile Communication)或者CDMA(码分多址,Code Division Multiple Access)或者WCDMA(宽带码分多址,Wideband  Code Division Multiple Access)等移动通讯技术与服务器连接。
本实施例中的服务器可以为设置于云端的服务器,即云端服务器。本实施例中的网络摄像头可以具体为集成有语音识别功能以及音频播放功能的网络摄像头。另外,上述用户智能终端设备可以为智能移动电话或者台式计算机或者笔记型计算机或者平板电脑等可以通过移动通讯技术与服务器进行信息交互的智能电子设备。
下面结合图1对本实施例的方法所包含的各个步骤进行说明。
在图1中,S100、网络摄像头在视频监控状态下采集其所在环境的音频数据,并对其采集的音频数据进行语音识别。
具体的,本实施例的网络摄像头可以工作在多种不同的工作状态下,并在某一操作的触发下切换其工作状态,也就是说,网络摄像头可以根据实际情况自动的从其一种工作状态切换到另一种工作状态。
本实施例中的网络摄像头的工作状态主要包括:视频监控状态、通话状态以及媒体数据播放状态;在通常情况下,视频监控状态是网络摄像头的正常工作状态,即网络摄像头采集其所在环境的视频数据,并存储其采集到的视频数据,以实现目前摄像头通常的视频监控功能;通话状态即网络摄像头与用户智能终端设备之间所进行的媒体数据(如音频数据或视频数据)的交互,也就是说,网络摄像头和用户智能终端设备之间通过服务器而联通,这样,网络摄像头位置处的用户和用户智能终端设备位置处的用户可以通过网络摄像头和用户智能终端设备实现IP(网络之间互连的协议,Internet Protocol)通话(即网络通话);媒体数据播放状态即网络摄像头与服务器之间的媒体数据(如音频数据或者视频数据)的传输,也就是说,网络摄像头接收服务器传输来的媒体数据(如音频数据或者视频数据),并播放该媒体数据。本实施例中的网络摄像头在通常情况下会处于视频监控状态。
上述IP通话可以具体为IP语音通话,也可以具体为IP视频通话,该IP通话可以为现有的社交应用中的多媒体通话,如该IP通话可以为QQ聊天工具中的视频通话或者微信聊天工具中的视频聊天等。
本实施例中的网络摄像头无论是处于视频监控状态,还是其处于通话状态,亦或是处于媒体数据播放状态,均可以按照预先设定的参数(如采集频率等)执行音频数据的采集操作;但是,在通常情况下,网络摄像头只有在其处于视频监控状态下,才会对其采集到的音频数据执行语音识别处理;然而,在实际应用中,网络摄像头在其处于通话状态或者媒体数据播放状态时对其采集到的音频数据执行语音识别处理也是完全可行的。
本实施例中的网络摄像头具有简单的语音识别处理能力,如网络摄像头可以将其采集的音频数据转化为文本文字等。网络摄像头可以采用现有的语音识别技术对 其采集的音频数据进行语音识别处理。在本实施例中不再详细描述网络摄像头进行语音识别处理的具体实现过程。
S110、网络摄像头从语音识别结果中提取关键词。
具体的,网络摄像头可以将其语音识别结果中的语气助词以及连词等不重要的字或者词去除,从而获得一个或者多个关键词。在网络摄像头将其采集的音频数据转化为文本文字的情况下,网络摄像头可以采用多种方式从语音识别出的文本文字中提取关键词,例如,网络摄像头可以采用文本关键词提取算法来获得关键词。在本实施例中不再详细描述网络摄像头进行关键词提取的具体实现过程。
S120、在网络摄像头提取的关键词属于预定关键词的情况下,网络摄像头向指定服务器(即上述服务器)发送携带有网络摄像头的标识信息以及基础数据的处理请求,这里的基础数据包括:关键词、音频数据以及视频数据中的至少一种。
具体的,预定关键词可以是网络摄像头中本地存储的关键词,也可以是存储于其他设备中的关键词。下述以预定关键词为网络摄像头中存储的关键词为例进行说明。
网络摄像头中预先设置有一个或者多个关键词,这些预先设置的关键词形成关键词集合;用户可以通过其用户智能终端设备访问与网络摄像头连接的服务器,并利用服务器设置网络摄像头中的关键词集合所包含的部分或者全部关键词;另外,上述关键词集合所包含的部分或者全部关键词也可以是网络摄像头在出厂时设置于网络摄像头中的。
网络摄像头可以将其提取出的关键词与关键词集合中的关键词进行比较以根据比较结果产生相应的处理请求,如网络摄像头将其提取出的关键词与关键词集合中的关键词进行匹配,网络摄像头可以在其提取出的任何一个关键词与其存储的关键词集合中的一个关键词匹配的情况下,生成相应的处理请求,并将该处理请求发送给服务器。
网络摄像头生成的处理请求中应携带有其网络摄像头的标识信息,以表明该处理请求是哪个网络摄像头发送给服务器的。该处理请求中还可以携带有网络摄像头提取出的关键词,以表示网络摄像头希望服务器能够根据处理请求中携带的关键词而执行相应的操作;例如,网络摄像头发送的处理请求中携带的关键词为“呼叫”和“爸爸”,则表示网络摄像头希望服务器执行呼叫相应的用户智能终端设备的操作;再例如,网络摄像头发送的处理请求中携带的关键词为“百度”、“红茶”和“品种”,则表示网络摄像头希望服务器执行查询红茶品种的操作。
为了使服务器能够更准确的执行用户所期望的操作,网络摄像头可以在其提取出的任何一个关键词与其存储的关键词集合中的关键词匹配的情况下,将其采集到 的对应上述关键词的相应的音频数据携带在处理请求中,以使服务器可以对该音频数据进行更智能化的语音识别及分析。
需要特别说明的是,网络摄像头发送给服务器的处理请求中可以携带有网络摄像头的标识信息,也可以携带有网络摄像头的标识信息和关键词,还可以携带有网络摄像头的标识信息和网络摄像头采集的音频数据,当然,该处理请求也可以携带有网络摄像头的标识信息、关键词以及网络摄像头采集的音频数据;网络摄像头可以在其向服务器发送的各处理请求中均携带其采集的音频数据,也可以在需要时才在处理请求中携带音频数据,如网络摄像头根据其语音识别结果对用户所要求服务器执行的操作不明确时,网络摄像头在处理请求中携带其采集的音频数据,而如果网络摄像头根据其语音识别结果对用户所要求服务器执行的操作非常明确时,网络摄像头可以不在处理请求中携带其采集的音频数据。网络摄像头发送给服务器的处理请求中可以携带有其采集的视频数据,该视频数据有利于服务器对网络摄像头处的用户的需求进行进一步的分析。本实施例的处理请求中承载的音频数据和视频数据均为包含有危险图像对应时间段的音频数据和视频数据。
另外,本实施例中的处理请求可以是基于HTTP(超文本传送协议,Hypertext transfer protocol)的消息,也可以是基于其他协议的消息。还有,本实施例中的网络摄像头的标识信息可以为网络摄像头物理设备编码信息,也可以为用户的智能移动电话的手机号码,还可以为社交应用的用户账号,如QQ聊天工具的用户账号或者微信聊天工具的用户账号等。
从上述描述可知,本实施例中的网络摄像头是具有简单语言分析能力的网络摄像头,且该网络摄像头能够利用该简单的语言分析能力执行相应的操作;也就是说,网络摄像头可以识别出其采集的音频数据中是否包含有预定的关键词,且网络摄像头在分析出其采集的音频数据中包含有预定的关键词的情况下,可以产生相应的处理请求,并向与其连接的服务器发送其产生的处理请求。
S130、服务器根据接收到的处理请求中的基础数据产生处理响应,并基于该处理响应执行与相应的用户智能终端设备和/或处理请求中携带的网络摄像头的标识信息对应的网络摄像头的信息交互。
具体的,服务器根据其接收到的处理请求所执行的信息交互操作可以具体为:接通对话操作、通知用户操作、查询并返回查询结果的操作或者返回无效信息操作等,相应的,上述处理响应可以是针对呼叫的处理响应,可以是针对通知的处理响应,也可以是针对查询的处理响应,还可以是针对无效信息的处理响应。接通对话操作即联通用户智能终端设备与网络摄像头之间的IP对话;通知用户操作即向用户智能终端设备发送相应的提示信息;查询并返回查询结果的操作即获取网络摄像头 所需查询的内容并将查询到的内容返回给网络摄像头;返回无效信息操作即服务器向网络摄像头返回表示网络摄像头采集的音频数据无意义的信息。
在服务器接收到的处理请求中携带有网络摄像头的标识信息,而没有携带有网络摄像头提取出的关键词或者网络摄像头采集的音频数据的情况下,服务器会根据预先设置的缺省操作信息来执行相应的操作,例如,服务器在接收到处理请求时,从处理请求中获取网络摄像头的标识信息,并利用该网络摄像头的标识信息(如网络摄像头的用户账号信息)从其存储的信息中查找用户智能终端设备的用户账号信息,并根据该网络摄像头的用户账号信息和用户智能终端设备的用户账号信息接通网络摄像头和用户智能终端设备之间的IP通话,且在服务器接通两者之间的IP通话时,网络摄像头处于通话状态。在网络摄像头处于通话状态的情况下,网络摄像头可以将其当前实时采集的音频数据和/或视频数据实时地传输至服务器,并由服务器传输给用户智能终端设备,且网络摄像头在接收到用户智能终端设备发送的经由服务器传输来的音频数据时,应及时播放该音频数据;在网络摄像头具有显示屏的情况下,网络摄像头还可以播放用户智能终端设备发送的经由服务器传输来的视频数据;在用户智能终端设备与网络摄像头之间的IP通话结束之后,网络摄像头切换到视频监控状态,继续采集视频数据以及音频数据,并对其采集的音频数据进行语音识别处理。
在服务器接收到的处理请求中携带有网络摄像头的标识信息以及网络摄像头提取出的关键词,而没有携带有网络摄像头采集的音频数据的情况下,服务器会根据处理请求中携带的关键词来执行相应的操作,例如,服务器在接收到处理请求时,从处理请求中获取网络摄像头的标识信息以及关键词,在关键词中包含有“呼叫”和“爸爸”时,服务器利用网络摄像头的标识信息(如网络摄像头的用户账号信息)从其存储的信息中查找与爸爸对应的用户智能终端设备的用户账号信息,并根据网络摄像头的用户账号信息和查找到的用户智能终端设备的用户账号信息接通网络摄像头和用户智能终端设备之间的IP通话,且在接通两者的IP通话时,网络摄像头处于通话状态;在IP通话结束之后,网络摄像头切换到视频监控状态,网络摄像头继续采集视频以及视频数据,并对采集的音频数据进行语音识别处理;再例如,服务器在接收到处理请求时,从该处理请求中获取网络摄像头的标识信息以及关键词,在获取的关键词中包含有“百度”、“红茶”和“品种”的情况下,服务器利用搜索引擎查找“红茶品种”对应的查询结果,在通常情况下,服务器会获得多个查询结果,服务器可以从多个查询结果中选取一个查询结果,如服务器选取百度百科中对“红茶品种”的介绍;服务器将查找到的“红茶品种”的具体内容转化为相应格式的数据(如音频数据或者视频数据等),并通过查询响应返回给网络摄像头;在 网络摄像头接收到服务器返回的查询响应时,网络摄像头切换到媒体数据播放状态。网络摄像头在播放完查询响应中携带的查询结果(如音频数据和/或视频数据)之后,自动切换到视频监控状态,继续采集视频以及音频数据,并对其采集的音频数据进行语音识别处理。
在服务器接收到的处理请求中携带有网络摄像头的标识信息以及网络摄像头采集的音频数据,而没有携带有网络摄像头提取的关键词的情况下,服务器对处理请求中携带的音频数据进行语音识别处理,并根据自身的语音识别处理结果来执行相应的操作;本实施例中的服务器通常具有比网络摄像头所具有的语音识别技术更智能更复杂的语音识别技术;
一个具体的例子,服务器在接收到处理请求时,从该处理请求中获取音频数据,并对该音频数据进行语音识别处理,在判断该音频数据无实际意义时,服务器向网络摄像头的标识信息对应的网络摄像头返回携带有表示无效音频数据的信息的处理响应;服务器在判断该音频数据为呼叫用户智能终端设备的情况下(如呼叫135********),可以根据其存储的信息确定135********对应的用户智能终端设备的用户账号,并根据该用户账号呼叫用户智能终端设备,服务器在接通用户智能终端设备之后,根据网络摄像头的标识信息确定网络摄像头的用户账号,并根据网络摄像头的用户账号联通用户智能终端设备与网络摄像头之间的IP通话,且在服务器接通两者之间的IP通话时,网络摄像头处于通话状态。在用户智能终端设备与网络摄像头之间的IP通话结束之后,网络摄像头切换到视频监控状态,继续采集视频以及音频数据,并对其采集的音频数据进行语音识别处理。
另一个具体的例子,服务器在接收到处理请求时,从该处理请求中获取音频数据,并对该音频数据进行语音识别处理,在判断该音频数据无实际意义时,服务器向网络摄像头的标识信息对应的网络摄像头返回携带有表示无效音频数据的信息的处理响应;服务器在判断出该音频数据为用户查询相应内容的情况下(如查询如何从**到北京火车站),可以利用搜索引擎并根据其识别出的搜索关键词进行搜索查询操作,服务器在获得查询结果后,将查询结果转化为相应格式的数据(如音频数据或者视频数据等),并将该查询结果对应的数据承载于查询响应中,服务器向网络摄像头的标识信息对应的网络摄像头返回该查询响应,网络摄像头在接收到服务器传输来的承载有查询结果的查询响应后,处于媒体数据播放状态,并向用户展示查询响应中的查询结果,如播放服务器发送来的查询响应中携带的音频数据。网络摄像头在向用户展示查询相应中的查询结果之后(如播放完音频数据之后),自动切换到视频监控状态,继续采集视频以及音频数据,并对其采集的音频数据进行语音识别处理。
再一个具体的例子,服务器在接收到处理请求时,从该处理请求中获取音频数据和视频数据,并对该音频数据进行语音识别处理,在判断该音频数据无实际意义时,服务器向网络摄像头的标识信息对应的网络摄像头返回携带有表示无效音频数据的信息的处理响应;服务器在判断该音频数据为呼叫用户智能终端设备的情况下(如呼叫爸爸),可以对其获取的视频数据进行图像识别,以判断爸爸所指代的用户,然后根据其存储的信息确定指代的用户对应的用户智能终端设备的用户账号,并根据该用户账号呼叫用户智能终端设备,服务器在接通用户智能终端设备之后,根据网络摄像头的标识信息确定网络摄像头的用户账号,并根据网络摄像头的用户账号联通用户智能终端设备与网络摄像头之间的IP通话,且在服务器接通两者之间的IP通话时,网络摄像头处于通话状态。在用户智能终端设备与网络摄像头之间的IP通话结束之后,网络摄像头切换到视频监控状态,继续采集视频以及音频数据,并对其采集的音频数据进行语音识别处理。
在服务器接收到的处理请求中携带有网络摄像头的标识信息、网络摄像头提取出的关键词以及网络摄像头采集的音频数据的情况下,服务器需要对处理请求中携带的音频数据进行语音识别处理,服务器可以仅根据自身的语音识别处理结果来执行相应的操作;服务器也可以根据自身的语音识别处理结果并参考处理请求中携带的关键词执行相应的操作;在实际应用中,服务器可以根据其内部预先设置的相应的逻辑来决定是否参考网络摄像头传输来的处理请求中携带的关键词来执行相应的操作。另外,在处理请求中携带有视频数据的情况下,服务器可以对处理请求中携带的视频数据进行图像识别处理,服务器应根据相应的逻辑来决定是否参考图像识别结果来执行相应的操作。这里的逻辑可以根据实际情况来设置,在本实施例中不再详细说明。
实施例二、智能摄像***。下面结合图2对本实施例的智能摄像***所包含的各设备以及各设备的具体结构进行详细说明。
图2示出的智能摄像***主要包括:网络摄像头200以及与网络摄像头200连接的服务器210;虽然图2中仅示意性的示出了一个网络摄像头200与服务器210连接,但是在实际应用中,一个服务器210通常与多个网络摄像头200均连接。
网络摄像头200可以通过WIFI与服务器210连接,当然,网络摄像头200也可以通过有线连接方式与服务器210连接。本实施例中的服务器210还与多个用户智能终端设备220分别连接(图2中仅示意性的示出了一个用户智能终端设备220),例如,用户智能终端设备220通过WIFI或者GSM或者CDMA或者WCDMA等移动通讯技术与服务器210连接。
本实施例中的服务器210可以为设置于云端的服务器,即服务器210为云端服 务器。本实施例中的网络摄像头200可以具体为集成有语音识别功能以及音频播放功能的网络摄像头。另外,上述用户智能终端设备220可以为智能移动电话或者台式计算机或者笔记型计算机或者平板电脑等可以通过移动通讯技术与服务器进行信息交互的智能电子设备。
本实施例中的网络摄像头200主要包括:采集模块201、第一语音识别模块202、提取模块203、请求模块204以及交互处理模块205。
本实施例中的服务器210主要包括:处理模块211;且该处理模块211主要包括:第二语音识别模块212、呼叫模块213、查询模块214以及无效响应模块215。
下面对上述各模块所执行的操作进行说明。
采集模块201主要适于在网络摄像头200处于视频监控状态下,采集网络摄像头200所在环境的音频数据。
具体的,网络摄像头200可以工作在多种不同的工作状态下,并在某一操作的触发下切换其工作状态,也就是说,网络摄像头200可以根据实际情况自动的从其一种工作状态切换到另一种工作状态。
本实施例中的网络摄像头200的工作状态主要包括:视频监控状态、通话状态以及媒体数据播放状态;在通常情况下,视频监控状态是网络摄像头200的正常工作状态,即网络摄像头200采集其所在环境的视频数据,并存储其采集到的视频数据,以实现目前摄像头通常的视频监控功能;通话状态即网络摄像头200与用户智能终端设备220之间所进行的媒体数据(如音频数据或视频数据)的交互,也就是说,网络摄像头200和用户智能终端设备220之间通过服务器210而联通,这样,网络摄像头200位置处的用户和用户智能终端设备220位置处的用户可以通过网络摄像头200和用户智能终端设备220实现IP通话(即网络通话);媒体数据播放状态即网络摄像头200与服务器210之间的媒体数据(如音频数据或者视频数据)的传输,即网络摄像头200接收服务器210传输来的媒体数据(如音频数据或者视频数据),并播放该媒体数据。本实施例中的网络摄像头200在通常情况下会处于视频监控状态。
上述IP通话可以具体为IP语音通话,也可以具体为IP视频通话,该IP通话可以为现有的社交应用中的多媒体通话,如该IP通话可以为QQ聊天工具中的视频通话或者微信聊天工具中的视频聊天等。
第一语音识别模块202主要适于对采集模块201采集的音频数据进行语音识别。
具体的,本实施例中的网络摄像头200无论是处于视频监控状态,还是其处于通话状态,亦或是处于媒体数据播放状态,采集模块201均可以按照预先设定的采集频率执行音频数据的采集操作;但是,通常情况下,第一语音识别模块202只有 在网络摄像头200处于视频监控状态下,才会对采集模块210采集到的音频数据执行语音识别处理;然而,在实际应用中,第一语音识别模块202在网络摄像头200处于通话状态或者媒体数据播放状态时对采集模块201采集到的音频数据执行语音识别处理也是完全可行的。
本实施例中的网络摄像头200具有简单的语音识别处理能力,如第一语音识别模块202可以将采集模块201采集的音频数据转化为文本文字等。第一语音识别模块202可以采用现有的语音识别技术对采集模块201采集的音频数据进行语音识别处理。在本实施例中不再详细描述第一语音识别模块202进行语音识别处理的具体实现过程。
提取模块203主要适于从第一语音识别模块202的语音识别结果中提取关键词。
具体的,提取模块203可以将第一语音识别模块202的语音识别结果中的语气助词以及连词等不重要的字或者词去除,从而获得一个或者多个关键词。在第一语音识别模块202将采集模块201其采集的音频数据转化为文本文字的情况下,提取模块203可以采用多种方式从语音识别出的文本文字中提取关键词,例如,提取模块203可以采用文本关键词提取算法来获得关键词。在本实施例中不再详细描述提取模块203进行关键词提取的具体实现过程。
请求模块204主要适于在提取模块203提取的关键词属于预定关键词的情况下,向与网络摄像头200连接的服务器210发送携带有网络摄像头的标识信息以及基础数据的处理请求,这里的基础数据包括:关键词、音频数据以及视频数据中的至少一种。
具体的,预定关键词可以是网络摄像头中本地存储的关键词,也可以是存储于其他设备中的关键词。下述以预定关键词为网络摄像头中存储的关键词为例进行说明。
网络摄像头200中预先设置有一个或者多个关键词,这些预先设置的关键词形成关键词集合;用户可以通过其用户智能终端设备220来访问与网络摄像头200连接的服务器210,并利用服务器210来设置网络摄像头200中的关键词集合所包含的关键词;另外,上述关键词集合所包含的某些或者全部关键词也可以是网络摄像头200在出厂时设置于网络摄像头200中的。
请求模块204可以将其提取出的关键词与关键词集合中的关键词进行比较以根据比较结果产生相应的处理请求,如请求模块204将提取模块203提取出的关键词与关键词集合中的关键词进行匹配,请求模块204可以在提取模块203提取出的任何一个关键词与网络摄像头200存储的关键词集合中的一个关键词匹配的情况下,生成相应的处理请求,并将该处理请求发送给服务器210。
网络摄像头200生成的处理请求中应携带有其网络摄像头的标识信息,以表明该处理请求是哪个网络摄像头200发送给服务器210的。该处理请求中还可以携带有网络摄像头提取出的关键词,以表示请求模块204希望服务器能够根据处理请求中携带的关键词而执行相应的操作;例如,请求模块204发送的处理请求中携带的关键词为“呼叫”和“爸爸”,则表示请求模块204希望服务器210执行呼叫相应的用户智能终端设备220的呼叫操作;再例如,请求模块204发送的处理请求中携带的关键词为“百度”、“红茶”以及“品种”,则表示请求模块204希望服务器210执行查询红茶品种的查询操作。
为了使服务器210能够更准确的执行用户所期望的操作,请求模块204可以在提取模块203提取出的任何一个关键词与其存储的关键词集合中的关键词匹配的情况下,将采集模块201采集到的对应上述关键词的相应的音频数据携带在处理请求中,以使服务器210可以对该音频数据进行更智能化的语音识别及分析。
需要特别说明的是,请求模块204发送给服务器210的处理请求中可以携带有网络摄像头的标识信息,也可以携带有网络摄像头的标识信息以及关键词,还可以携带有网络摄像头的标识信息和网络摄像头采集的音频数据,当然,该处理请求也可以携带有网络摄像头的标识信息、关键词以及网络摄像头采集的音频数据;请求模块204可以在其向服务器发送的各处理请求中均携带其采集的音频数据,也可以在需要时才在处理请求中携带音频数据,如请求模块204根据语音识别结果对用户所要求服务器210执行的操作不明确时,请求模块204在处理请求中携带其采集的音频数据,而如果请求模块204根据语音识别结果对用户所要求服务器210执行的操作非常明确时,请求模块204可以不在处理请求中携带其采集的音频数据。网络摄像头发送给服务器的处理请求中可以携带有其采集的视频数据,该视频数据有利于服务器对网络摄像头处的用户的需求进行进一步的分析。本实施例中的音频数据和视频数据均为包含有关键词对应时间段的音频数据和视频数据。
另外,本实施例中的处理请求可以是基于HTTP的消息,也可以是基于其他协议的消息。还有,本实施例中的网络摄像头的标识信息可以为网络摄像头物理设备编码信息,也可以为用户的智能移动电话的手机号码,还可以为社交应用的用户账号,如QQ聊天工具的用户账号或者微信聊天工具的用户账号等。
从上述描述可知,本实施例中的网络摄像头200是具有简单语言分析能力的网络摄像头,且该网络摄像头200能够利用该简单的语言分析能力执行相应的操作;也就是说,网络摄像头200可以识别出其采集的音频数据中是否包含有预定的关键词,且网络摄像头200在分析出其采集的音频数据中包含有预定的关键词的情况下,可以产生相应的处理请求,并向与其连接的服务器210发送其产生的处理请求。
处理模块211主要适于根据服务器210接收到的处理请求中的基础数据产生相应的处理响应,并基于该处理响应执行与相应的用户智能终端设备220和/或网络摄像头的标识信息对应的网络摄像头200的信息交互。
具体的,处理模块211根据服务器210接收到的处理请求所执行的信息交互操作可以具体为:接通对话操作、通知用户操作、查询并返回查询结果的操作或者返回无效信息操作等,相应的,上述处理响应可以是针对呼叫的处理响应,可以是针对通知的处理响应,也可以是针对查询的处理响应,还可以是针对无效信息的处理响应。接通对话操作即联通用户智能终端设备220与网络摄像头200之间的IP对话;通知用户操作即向用户智能终端设备220发送相应的提示信息;查询并返回查询结果的操作即获取网络摄像头200所需查询的内容并将查询到的内容返回给网络摄像头200;返回无效信息操作即服务器210向网络摄像头200返回表示网络摄像头200采集的音频数据无意义的信息。
第二语音识别模块212主要适于从服务器210接收到的处理请求中获取音频数据,并对其获取的音频数据进行语音识别。
呼叫模块213主要适于在判断出第二语音识别模块212的语音识别结果为呼叫用户智能终端设备220的情况下,根据服务器210中存储的信息确定用户智能终端设备220的用户账号,并根据该用户账号呼叫用户智能终端设备220,在接通用户智能终端设备220的情况下,根据网络摄像头的标识信息确定网络摄像头200的用户账号,并根据网络摄像头200的用户账号联通用户智能终端设备220与网络摄像头200之间的IP通话,使网络摄像头200处于通话状态。
查询模块214主要适于在第二语音识别模块212判断出语音识别结果为信息查询的情况下,根据查询关键词获取查询结果,并向网络摄像头的标识信息对应的网络摄像头200返回携带有查询结果的音频数据的查询响应。
交互处理模块205主要适于在网络摄像头200处于媒体数据播放状态的情况下,播放服务器210发送来的查询响应中携带的音频数据。
无效响应模块215主要适于在根据第二语音识别模块212判断出语音识别结果为音频数据无意义的情况下,向网络摄像头的标识信息对应的网络摄像头200返回携带有表示无效音频数据的信息的处理响应。
具体的,在服务器210接收到的处理请求中携带有网络摄像头的标识信息,而没有携带有网络摄像头200提取出的关键词或者网络摄像头200采集的音视频数据的情况下,处理模块211中的相应模块会根据预先设置的缺省操作信息来执行相应的操作,例如,服务器210在接收到处理请求时,呼叫模块213从处理请求中获取网络摄像头的标识信息,并利用该网络摄像头的标识信息(如网络摄像头200的用 户账号信息)从服务器210存储的信息中查找用户智能终端设备220的用户账号信息,并根据该网络摄像头的用户账号信息和用户智能终端设备的用户账号信息接通网络摄像头和用户智能终端设备之间的IP通话,且在呼叫模块213接通两者之间的IP通话时,网络摄像头200处于通话状态。在网络摄像头200处于通话状态的情况下,交互处理模块205可以将采集模块201当前实时采集的音频数据和/或视频数据实时地传输至服务器210,并由服务器210传输给用户智能终端设备220,且在网络摄像头200接收到用户智能终端设备发送的经由服务器传输来的音频数据时,交互处理模块205应及时播放该音频数据;在网络摄像头200具有显示屏的情况下,交互处理模块205还可以播放用户智能终端设备发送的经由服务器传输来的视频数据;在用户智能终端设备220与网络摄像头200之间的IP通话结束之后,网络摄像头200切换到视频监控状态,网络摄像头200继续采集视频数据以及音频数据,第一语音识别模块201对采集模块201采集的音频数据进行语音识别处理。
在服务器210接收到的处理请求中携带有网络摄像头的标识信息以及提取模块203提取出的关键词,而没有携带有网络摄像头采集的音频数据的情况下,处理模块211中的相应模块会根据处理请求中携带的关键词来执行相应的操作,例如,在服务器210接收到处理请求时,呼叫模块213和查询模块214均从处理请求中获取网络摄像头的标识信息以及关键词,在关键词中包含有“呼叫”和“爸爸”时,呼叫模块213利用网络摄像头的标识信息(如网络摄像头的用户账号信息)从服务器210存储的信息中查找与爸爸对应的用户智能终端设备的用户账号信息,并根据网络摄像头的用户账号信息和查找到的用户智能终端设备220的用户账号信息接通网络摄像头200和用户智能终端设备220之间的IP通话,且在接通两者的IP通话时,网络摄像头200处于通话状态;在IP通话结束之后,网络摄像头200切换到视频监控状态,网络摄像头200继续采集视频数据以及视频数据,第一语音识别模块202并采集模块201采集的音频数据进行语音识别处理;再例如,在服务器210接收到处理请求时,呼叫模块213和查询模块214分别从该处理请求中获取网络摄像头的标识信息以及关键词,在获取的关键词中包含有“百度”、“红茶”和“品种”的情况下,查询模块214利用搜索引擎查找“红茶品种”对应的查询结果,在查询模块214获得多个查询结果的情况下,查询模块214可以从多个查询结果中选取一个查询结果,如查询模块214选取百度百科中对“红茶品种”的介绍;查询模块214将查找到的“红茶品种”的具体内容转化为相应格式的数据(如音频数据或者视频数据等),并通过查询响应返回给网络摄像头200;在网络摄像头200接收到服务器210返回的查询响应时,网络摄像头200切换到媒体数据播放状态。网络摄像头在交互处理模块205播放完查询响应中携带的查询结果(如音频数据和/或视频数据)之后, 自动切换到视频监控状态,继续采集视频数据以及音频数据,第一语音识别模块202对采集模块201采集的音频数据进行语音识别处理。
在服务器接收到的处理请求中携带有网络摄像头的标识信息以及网络摄像头采集的音频数据,而没有携带有网络摄像头提取的关键词的情况下,第二语音识别模块212对处理请求中携带的音频数据进行语音识别处理,呼叫模块213、查询模块214或者无效响应模块215根据第二语音识别模块212的语音识别处理结果来执行相应的操作;本实施例中的第二语音识别模块212通常具有比第一语音识别模块202所具有的语音识别技术更智能更复杂的语音识别技术;
一个具体的例子,在服务器接收到处理请求时,第二语音识别模块212从该处理请求中获取音频数据,并对该音频数据进行语音识别处理,在判断该音频数据无实际意义时,无效响应模块215向网络摄像头的标识信息对应的网络摄像头返回携带有表示无效音频数据的信息的处理响应;在判断该音频数据为呼叫用户智能终端设备的情况下(如呼叫135********),呼叫模块213可以根据服务器210存储的信息确定135********对应的用户智能终端设备的用户账号,并根据该用户账号呼叫用户智能终端设备,呼叫模块213在接通用户智能终端设备之后,根据网络摄像头的标识信息确定网络摄像头的用户账号,并根据网络摄像头的用户账号联通用户智能终端设备与网络摄像头之间的IP通话,且在呼叫模块213接通两者之间的IP通话时,网络摄像头200处于通话状态。在用户智能终端设备220与网络摄像头200之间的IP通话结束之后,网络摄像头200切换到视频监控状态,继续采集视频数据以及音频数据,第一语音识别模块202对采集模块201采集的音频数据进行语音识别处理。
另一个具体的例子,在服务器接收到处理请求时,第二语音识别模块212从该处理请求中获取音频数据,并对该音频数据进行语音识别处理,在判断该音频数据无实际意义时,无效响应模块215向网络摄像头的标识信息对应的网络摄像头200返回携带有表示无效音频数据的信息的处理响应;在判断出该音频数据为用户查询相应内容的情况下(如查询如何从**到北京火车站等),查询模块214可以利用搜索引擎并根据第二语音识别模块212识别出的搜索关键词进行搜索查询操作,查询模块214在获得查询结果后,将查询结果转化为相应格式的数据(如音频数据或者视频数据等),并将该查询结果对应的数据承载于查询响应中,查询模块214向网络摄像头的标识信息对应的网络摄像头返回该查询响应,网络摄像头200在接收到服务器210传输来的承载有查询结果的查询响应后,处于媒体数据播放状态,交互处理模块205向用户展示查询响应中的查询结果,如交互处理模块205播放服务器210发送来的查询响应中携带的音频数据。网络摄像头在交互处理模块205向用户展 示查询相应中的查询结果之后(如播放完音频数据之后),自动切换到视频监控状态,继续采集视频数据以及音频数据,第一语音识别模块202对采集模块201采集的音频数据进行语音识别处理。
再一个具体的例子,服务器在接收到处理请求时,第二语音识别模块212从该处理请求中获取音频数据,并对该音频数据进行语音识别处理,在判断该音频数据无实际意义时,无效响应模块215向网络摄像头的标识信息对应的网络摄像头200返回携带有表示无效音频数据的信息的处理响应;服务器210在判断该音频数据为呼叫用户智能终端设备的情况下(如呼叫爸爸),服务器210中的图像识别模块可以对处理请求中携带的视频数据进行图像识别,以判断爸爸所指代的用户,然后呼叫模块213根据服务器210存储的信息确定指代的用户对应的用户智能终端设备的用户账号,并根据该用户账号呼叫用户智能终端设备,呼叫模块213在接通用户智能终端设备之后,根据网络摄像头的标识信息确定网络摄像头的用户账号,并根据网络摄像头的用户账号联通用户智能终端设备与网络摄像头之间的IP通话,且在呼叫模块213接通两者之间的IP通话时,网络摄像头200处于通话状态。在用户智能终端设备与网络摄像头200之间的IP通话结束之后,网络摄像头200切换到视频监控状态,继续采集视频以及音频数据,第一语音识别模块202对采集模块201采集的音频数据进行语音识别处理。
在服务器210接收到的处理请求中携带有网络摄像头的标识信息、网络摄像头提取出的关键词以及网络摄像头采集的音频数据的情况下,第二语音识别模块212需要对处理请求中携带的音频数据进行语音识别处理,呼叫模块213、查询模块214和无效响应模块215可以仅根据第二语音识别模块212的语音识别处理结果来执行相应的操作;呼叫模块213、查询模块214和无效响应模块215也可以根据第二语音识别模块212的语音识别处理结果并参考处理请求中携带的关键词执行相应的操作;在实际应用中,呼叫模块213、查询模块214以及无效响应模块215可以根据预先设置的相应的逻辑来决定是否参考网络摄像头传输来的处理请求中携带的关键词来执行相应的操作。另外,在处理请求中携带有视频数据的情况下,服务器中的图像识别模块可以对处理请求中携带的视频数据进行图像识别处理,呼叫模块213、查询模块214以及无效响应模块215应根据相应的逻辑来决定是否参考图像识别结果来执行相应的操作。这里的逻辑可以根据实际情况来设置,在本实施例中不再详细说明。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个, 在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的智能摄像***和/或网络摄像头中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图3示出了可以实现根据本发明的智能摄像***的实现方法的计算设备。该计算设备传统上包括处理器310和以存储器320形式的计算机程序产品或者计算机可读介质。存储器320可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器320具有用于执行上述方法中的任何方法步骤的程序代码331的存储空间330。例如,用于程序代码的存储空间330可以包括分别用于实现上面的方法中的各种步骤的各个程序代码331。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产 品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图4所述的便携式或者固定存储单元。该存储单元可以具有与图3的计算设备中的存储器320类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括计算机可读代码331’,即可以由例如诸如310之类的处理器读取的代码,这些代码当由计算设备运行时,导致该计算设备执行上面所描述的方法中的各个步骤。
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本发明的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
此外,还应当注意,本说明书中使用的语言主要是为了可读性和教导的目的而选择的,而不是为了解释或者限定本发明的主题而选择的。因此,在不偏离所附权利要求书的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围,对本发明所做的公开是说明性的,而非限制性的,本发明的范围由所附权利要求书限定。

Claims (15)

  1. 一种智能摄像***的实现方法,包括:
    网络摄像头在视频监控状态下采集其所在环境的音频数据,并对其采集的音频数据进行语音识别;
    网络摄像头从语音识别结果中提取关键词;
    在所述提取的关键词属于预定关键词的情况下,网络摄像头向指定服务器发送携带有所述网络摄像头的标识信息以及基础数据的处理请求,所述基础数据包括:所述关键词、音频数据以及视频数据中的至少一种;
    指定服务器根据接收到的处理请求中的基础数据产生处理响应,并基于该处理响应执行与相应的用户智能终端设备和/或所述网络摄像头的标识信息对应的网络摄像头的信息交互。
  2. 如权利要求1所述的方法,其中,所述网络摄像头通过WIFI与指定服务器连接。
  3. 如权利要求1所述的方法,其中,所述指定服务器根据接收到的处理请求中的基础数据产生处理响应,并基于该处理响应执行与相应的用户智能终端设备和/或所述网络摄像头的标识信息对应的网络摄像头的信息交互包括:
    指定服务器从其接收到的处理请求中获取音频数据,并对其获取的音频数据进行语音识别;
    指定服务器在判断出语音识别结果为呼叫用户智能终端设备的情况下,根据其存储的信息确定用户智能终端设备的用户账号,并根据该用户账号呼叫用户智能终端设备;
    指定服务器在接通用户智能终端设备的情况下,根据网络摄像头的标识信息确定网络摄像头的用户账号,并根据网络摄像头的用户账号联通用户智能终端设备与网络摄像头之间的IP通话,使所述网络摄像头处于通话状态。
  4. 如权利要求1所述的方法,其中,所述指定服务器根据接收到的处理请求中的基础数据产生处理响应,并基于该处理响应执行与相应的用户智能终端设备和/或所述网络摄像头的标识信息对应的网络摄像头的信息交互包括:
    指定服务器从其接收到的处理请求中获取音频数据,并对其获取的音频数据进行语音识别;
    指定服务器在判断出语音识别结果为信息查询的情况下,根据查询关键词获取查询结果,并向所述网络摄像头的标识信息对应的网络摄像头返回携带有查询结果 的音频数据的查询响应;
    网络摄像头处于媒体数据播放状态,并播放指定服务器发送来的查询响应中携带的音频数据。
  5. 如权利要求1所述的方法,其中,所述指定服务器根据接收到的处理请求中的基础数据产生处理响应,并基于该处理响应执行与相应的用户智能终端设备和/或所述网络摄像头的标识信息对应的网络摄像头的信息交互包括:
    指定服务器从其接收到的处理请求中获取音频数据,并对其获取的音频数据进行语音识别;
    指定服务器在根据出语音识别结果确定出音频数据无意义的情况下,向所述网络摄像头的标识信息对应的网络摄像头返回携带有表示无效音频数据的信息的处理响应。
  6. 一种智能摄像***,所述***包括:
    采集模块,设置于网络摄像头中,适于在网络摄像头处于视频监控状态下,采集网络摄像头所在环境的音频数据;
    第一语音识别模块,设置于网络摄像头中,适于对采集模块采集的音频数据进行语音识别;
    提取模块,设置于网络摄像头中,适于从语音识别结果中提取关键词;
    请求模块,设置于网络摄像头中,适于在所述提取的关键词属于预定关键词的情况下,向指定服务器发送携带有所述网络摄像头的标识信息以及基础数据的处理请求,所述基础数据包括:所述关键词、音频数据以及视频数据中的至少一种;
    处理模块,设置于指定服务器中,适于根据指定服务器接收到的处理请求中的基础数据产生处理响应,并基于该处理响应执行与相应的用户智能终端设备和/或所述网络摄像头的标识信息对应的网络摄像头的信息交互。
  7. 如权利要求6所述的***,其中,所述网络摄像头通过其WIFI模块与指定服务器连接。
  8. 如权利要求6所述的***,其中,所述处理模块包括:
    第二语音识别模块,适于从指定服务器接收到的处理请求中获取音频数据,并对其获取的音频数据进行语音识别;
    呼叫模块,适于在判断出语音识别结果为呼叫用户智能终端设备的情况下,根据指定服务器中存储的信息确定用户智能终端设备的用户账号,并根据该用户账号呼叫用户智能终端设备,在接通用户智能终端设备的情况下,根据所述网络摄像头的标识信息确定网络摄像头的用户账号,并根据网络摄像头的用户账号联通用户智 能终端设备与网络摄像头之间的IP通话,使所述网络摄像头处于通话状态。
  9. 如权利要求6所述的***,其中,所述处理模块包括:
    第二语音识别模块,适于从指定服务器接收到的处理请求中获取音频数据,并对其获取的音频数据进行语音识别;
    查询模块,适于在判断出语音识别结果为信息查询的情况下,根据查询关键词获取查询结果,并向所述网络摄像头的标识信息对应的网络摄像头返回携带有查询结果的音频数据的查询响应;
    且所述网络摄像头还包括:交互处理模块,适于在网络摄像头处于媒体数据播放状态的情况下,播放指定服务器发送来的查询响应中携带的音频数据。
  10. 如权利要求6所述的***,其中,所述处理模块包括:
    第二语音识别模块,适于从指定服务器接收到的处理请求中获取音频数据,并对其获取的音频数据进行语音识别;
    无效响应模块,适于在根据语音识别结果确定出音频数据无意义的情况下,向所述网络摄像头的标识信息对应的网络摄像头返回携带有表示无效音频数据的信息的处理响应。
  11. 一种网络摄像头,包括:
    采集模块,适于在网络摄像头处于视频监控状态下采集网络摄像头所在环境的音频数据;
    第一语音识别模块,适于对采集模块采集的音频数据进行语音识别;
    提取模块,适于从语音识别结果中提取关键词;
    请求模块,适于在所述提取的关键词属于预定关键词的情况下,向指定服务器发送携带有所述网络摄像头的标识信息以及基础数据的处理请求,以使指定服务器根据接收到的处理请求中的基础数据产生处理响应,并基于该处理响应执行与相应的用户智能终端设备和/或所述网络摄像头的标识信息对应的网络摄像头的信息交互,所述基础数据包括:所述关键词、音频数据以及视频数据中的至少一种。
  12. 如权利要求11所述的网络摄像头,其中,所述网络摄像头通过其WIFI模块与指定服务器连接。
  13. 如权利要求11所述的网络摄像头,其中,所述网络摄像头还包括:
    交互处理模块,适于在网络摄像头处于媒体数据播放状态的情况下,播放指定服务器发送来的查询响应中携带的音频数据。
  14. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算设备上运行时,导致所述计算设备执行根据权利要求1-5中的任一个所述的智能摄像 ***的实现方法。
  15. 一种计算机可读介质,其中存储了如权利要求14所述的计算机程序。
PCT/CN2015/087559 2014-08-19 2015-08-19 智能摄像***的实现方法、智能摄像***和网络摄像头 WO2016026446A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410409942.5A CN105407316B (zh) 2014-08-19 2014-08-19 智能摄像***的实现方法、智能摄像***和网络摄像头
CN201410409942.5 2014-08-19

Publications (1)

Publication Number Publication Date
WO2016026446A1 true WO2016026446A1 (zh) 2016-02-25

Family

ID=55350207

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/087559 WO2016026446A1 (zh) 2014-08-19 2015-08-19 智能摄像***的实现方法、智能摄像***和网络摄像头

Country Status (2)

Country Link
CN (1) CN105407316B (zh)
WO (1) WO2016026446A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106685929A (zh) * 2016-12-06 2017-05-17 南京金雀智能科技有限公司 基于可穿戴式蓝牙视频耳机的通信处理***及方法
CN111107548A (zh) * 2019-01-07 2020-05-05 姜鹏飞 一种发送信息的方法、装置、设备及存储介质
CN111901655A (zh) * 2020-08-05 2020-11-06 海信视像科技股份有限公司 一种显示设备及摄像头功能的演示方法
CN112256871A (zh) * 2020-10-16 2021-01-22 国网江苏省电力有限公司连云港供电分公司 一种物资履约***及方法
CN112735413A (zh) * 2020-12-25 2021-04-30 浙江大华技术股份有限公司 一种基于摄像装置的指令分析方法、电子设备和存储介质
CN112801083A (zh) * 2021-01-29 2021-05-14 百度在线网络技术(北京)有限公司 图像识别的方法、装置、设备以及存储介质
CN113140138A (zh) * 2021-04-25 2021-07-20 新东方教育科技集团有限公司 互动教学方法、装置、存储介质及电子设备

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105898219B (zh) 2016-04-22 2019-05-21 北京小米移动软件有限公司 对象监控方法及装置
CN106790490B (zh) 2016-12-14 2019-10-15 北京小米移动软件有限公司 基于智能摄像机进行通话的方法及装置
CN107205097B (zh) * 2017-07-07 2020-09-29 北京小米移动软件有限公司 移动终端查找方法、装置以及计算机可读存储介质
CN110353628A (zh) * 2018-12-27 2019-10-22 深圳市汇春科技股份有限公司 一种单兵紧急救援装备
CN112312084A (zh) * 2020-10-16 2021-02-02 李小丽 一种智能影像监控***

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079996A (zh) * 2006-05-22 2007-11-28 北京盛开交互娱乐科技有限公司 一种基于视频和音频的交互式数字多媒体制作方法
CN102014278A (zh) * 2010-12-21 2011-04-13 四川大学 一种基于语音识别技术的智能视频监控方法
CN102170617A (zh) * 2011-04-07 2011-08-31 中兴通讯股份有限公司 移动终端及其远程控制方法
CN103002425A (zh) * 2011-09-16 2013-03-27 三星电子(中国)研发中心 自动触发紧急呼叫的方法和***及其移动终端
CN103280217A (zh) * 2013-05-02 2013-09-04 锤子科技(北京)有限公司 一种移动终端的语音识别方法及其装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070249406A1 (en) * 2006-04-20 2007-10-25 Sony Ericsson Mobile Communications Ab Method and system for retrieving information
CN101262490B (zh) * 2008-02-29 2011-11-23 中兴通讯股份有限公司 监控***
CN201307863Y (zh) * 2008-11-14 2009-09-09 成都绿芽科技发展有限公司 一种爱心智能机器
CN101656874A (zh) * 2009-09-17 2010-02-24 杭州智傲科技有限公司 一种远程视频监控方法
CN102708864A (zh) * 2011-03-28 2012-10-03 德信互动科技(北京)有限公司 基于对话的家用电子设备和控制方法
CN103136905A (zh) * 2011-11-25 2013-06-05 厦门瑞科技术有限公司 3g移动物联监控报警终端
CN203206395U (zh) * 2013-04-19 2013-09-18 福建亿榕信息技术有限公司 一种智能犯罪监控***
CN103501382B (zh) * 2013-09-17 2015-06-24 小米科技有限责任公司 语音服务提供方法、装置和终端
CN103729988A (zh) * 2014-01-15 2014-04-16 陈蜀乔 一种采用旧智能***控公共设施无线网络传输报警***
CN103949072B (zh) * 2014-04-16 2016-03-30 上海元趣信息技术有限公司 智能玩具交互、传输方法及智能玩具

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079996A (zh) * 2006-05-22 2007-11-28 北京盛开交互娱乐科技有限公司 一种基于视频和音频的交互式数字多媒体制作方法
CN102014278A (zh) * 2010-12-21 2011-04-13 四川大学 一种基于语音识别技术的智能视频监控方法
CN102170617A (zh) * 2011-04-07 2011-08-31 中兴通讯股份有限公司 移动终端及其远程控制方法
CN103002425A (zh) * 2011-09-16 2013-03-27 三星电子(中国)研发中心 自动触发紧急呼叫的方法和***及其移动终端
CN103280217A (zh) * 2013-05-02 2013-09-04 锤子科技(北京)有限公司 一种移动终端的语音识别方法及其装置

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106685929A (zh) * 2016-12-06 2017-05-17 南京金雀智能科技有限公司 基于可穿戴式蓝牙视频耳机的通信处理***及方法
CN111107548A (zh) * 2019-01-07 2020-05-05 姜鹏飞 一种发送信息的方法、装置、设备及存储介质
CN111901655A (zh) * 2020-08-05 2020-11-06 海信视像科技股份有限公司 一种显示设备及摄像头功能的演示方法
CN112256871A (zh) * 2020-10-16 2021-01-22 国网江苏省电力有限公司连云港供电分公司 一种物资履约***及方法
CN112256871B (zh) * 2020-10-16 2021-05-07 国网江苏省电力有限公司连云港供电分公司 一种物资履约***及方法
CN112735413A (zh) * 2020-12-25 2021-04-30 浙江大华技术股份有限公司 一种基于摄像装置的指令分析方法、电子设备和存储介质
CN112735413B (zh) * 2020-12-25 2024-05-31 浙江大华技术股份有限公司 一种基于摄像装置的指令分析方法、电子设备和存储介质
CN112801083A (zh) * 2021-01-29 2021-05-14 百度在线网络技术(北京)有限公司 图像识别的方法、装置、设备以及存储介质
CN112801083B (zh) * 2021-01-29 2023-08-08 百度在线网络技术(北京)有限公司 图像识别的方法、装置、设备以及存储介质
CN113140138A (zh) * 2021-04-25 2021-07-20 新东方教育科技集团有限公司 互动教学方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN105407316A (zh) 2016-03-16
CN105407316B (zh) 2019-05-31

Similar Documents

Publication Publication Date Title
WO2016026446A1 (zh) 智能摄像***的实现方法、智能摄像***和网络摄像头
WO2016026447A1 (zh) 智能摄像***的报警方法、智能摄像***和网络摄像头
US9621950B2 (en) TV program identification method, apparatus, terminal, server and system
EP2688296B1 (en) Video monitoring system and method
US10212107B2 (en) Methods and devices for controlling machines
US9749710B2 (en) Video analysis system
WO2014106384A1 (zh) 一种监控录像信息提供方法、装置及视频监控***
US20160239540A1 (en) Data Query Method and Apparatus, Server, and System
WO2018077214A1 (zh) 信息搜索方法和装置
CN105872838A (zh) 即时视频的媒体特效发送方法和装置
JP2020042834A (ja) アカウント情報取得方法、端末、サーバ、およびシステム
WO2015027717A1 (zh) 网络视频的播放方法、装置及终端设备
JP6986187B2 (ja) 人物識別方法、装置、電子デバイス、記憶媒体、及びプログラム
WO2013117085A1 (zh) 实现录像检索的方法、设备和***
WO2015062224A1 (en) Tv program identification method, apparatus, terminal, server and system
WO2022083343A1 (zh) 用于检测视频监控设备的方法和电子设备
TW201719502A (zh) 動態影像之物件辨識方法及自動截取目標圖像的互動式影片建立方法
WO2015043547A1 (en) A method, device and system for message response cross-reference to related applications
TW201724038A (zh) 監測服務裝置、電腦程式產品、藉由影像監測提供服務之方法及藉由影像監測啓用服務之方法
WO2014180263A1 (zh) 一种噪音处理方法、装置及***、存储介质
WO2017156934A1 (zh) 智能互联方法和智能终端
WO2015090099A1 (zh) 一种算法配置方法和***
WO2015027882A1 (en) Method, apparatus and terminal for image processing
US20160105620A1 (en) Methods, apparatus, and terminal devices of image processing
WO2017026154A1 (ja) 情報処理装置、情報処理方法及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15833554

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15833554

Country of ref document: EP

Kind code of ref document: A1