CN111613225A - Method and system for automatically reporting road violation based on voice and image processing - Google Patents

Method and system for automatically reporting road violation based on voice and image processing Download PDF

Info

Publication number
CN111613225A
CN111613225A CN202010344544.5A CN202010344544A CN111613225A CN 111613225 A CN111613225 A CN 111613225A CN 202010344544 A CN202010344544 A CN 202010344544A CN 111613225 A CN111613225 A CN 111613225A
Authority
CN
China
Prior art keywords
image
processing
reported
voice
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010344544.5A
Other languages
Chinese (zh)
Inventor
陈静静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202010344544.5A priority Critical patent/CN111613225A/en
Publication of CN111613225A publication Critical patent/CN111613225A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4084Scaling of whole images or parts thereof, e.g. expanding or contracting in the transform domain, e.g. fast Fourier transform [FFT] domain scaling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a method, a system, an electronic device and a computer readable storage medium for automatically reporting road violation based on voice and image processing, wherein the method comprises the following steps: acquiring a user intention through natural language processing and natural language generation; acquiring an image to be reported in the automobile data recorder according to the intention of a user; obtaining classification information corresponding to an image to be reported through a convolutional neural network; judging whether the image to be reported belongs to an illegal image or not by comparing the classification information corresponding to the image to be reported with a preset illegal standard image; and if the image to be reported belongs to the illegal image, reporting the image to be reported. In addition, the invention also relates to a block chain technology, and the preset violation standard image can be stored in the block chain.

Description

Method and system for automatically reporting road violation based on voice and image processing
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a road violation reporting method and system based on voice and image processing, an electronic device and a computer readable storage medium.
Background
In the driving process of the vehicle, the condition of vehicle violation can be generally carried out by shooting through a camera so as to penalize a violator, and in the condition without the camera, the violation condition is counted in a number which is not enough, such as: the vehicle runs on an emergency lane in a camera-free road section, or changes lanes at high speed without lighting, and the like. For the situation that the violation can not be captured by the camera, the owner of the vehicle can indicate that the violation reporting process is tedious, and the violation reporting process cannot be carried out in real time in the driving process, and most of the violation reporting process is carried out after the driving is finished, or the violation reporting process is forgotten after the fact.
Although most vehicle owners can install the automobile data recorder, the recording interception of the automobile data recorder is complex and the operation flow is tedious, so that a lot of troubles are brought to reporting of violation conditions.
Based on the above problems, the inventor has realized that the conventional road violation reporting method cannot meet the requirements of vehicle owners, and therefore, an automatic road violation reporting method is urgently needed to solve the above problems.
Disclosure of Invention
The invention provides a method, a system, an electronic device and a computer readable storage medium for automatically reporting road violation based on voice and image processing, and mainly aims to process voice through natural language processing and natural language generation, process an image shot by a driving recorder through a convolutional neural network, and obtain a violation image, so that the problem that the existing road violation reporting process is complex and tedious and cannot meet the requirements of vehicle owners is solved.
In addition, in order to achieve the above object, the present invention provides an automatic road violation reporting method based on voice and image processing, which is applied to an electronic device, and the method includes:
processing the user voice through natural language processing and natural language generation to obtain the user intention;
acquiring an image to be reported in the automobile data recorder according to the user intention;
processing the image to be reported through a convolutional neural network to obtain classification information corresponding to the image to be reported;
judging whether the image to be reported belongs to an illegal image or not by comparing the classification information corresponding to the image to be reported with a preset illegal standard image;
and if the image to be reported belongs to the illegal image, reporting the image to be reported.
Preferably, the processing the user speech through natural language processing and natural language generation, and the step of obtaining the user intention includes:
processing the user voice through the natural language processing, and converting the user voice into a text;
and analyzing the converted text through the natural language generation to acquire the character information of the user in the text.
Preferably, the processing the user speech through the natural language processing, the converting the user speech into text includes:
preprocessing the user voice and then extracting features;
and carrying out mode matching on the extracted features and the voice signals in the voice model library to realize the conversion from the voice of the user to the text.
Preferably, the step of analyzing the converted text through the natural language generation to obtain the text information representing the user text information includes:
context understanding and semantic disambiguation are carried out on a plurality of phrases of a received text by combining a context by utilizing the built deep learning model, and semantic results of the phrases are obtained;
comparing the semantic results of the phrases with phrases of a knowledge graph respectively to obtain a similarity value of each phrase;
taking the phrase with the highest similarity value as the semantic result of each phrase, and further acquiring the semantic results of a plurality of phrases;
and combining the semantic results of the plurality of word groups to generate a semantic understanding result of the character information, and acquiring the character information intended by the user according to the semantic understanding result.
Preferably, the step of processing the acquired images in the automobile data recorder through the convolutional neural network to acquire classification information corresponding to the images includes:
preprocessing the acquired images in the automobile data recorder;
extracting image features of the preprocessed image through a convolutional neural network to obtain the features of each element image, and quantizing the features to obtain feature vectors;
acquiring target classification characteristic information according to the characteristic vectors, wherein the target classification characteristic information comprises a plurality of target categories and target characteristic vectors corresponding to each target category;
determining the classification corresponding to the image according to the element image and the target characteristics in the target classification characteristic information; and matching the element images with the target features in the target classification feature information to obtain the category information corresponding to each element image.
Preferably, the step of preprocessing the acquired images in the automobile data recorder comprises:
according to the length-width ratio information of the images to be processed, grouping the images to be processed and obtaining a plurality of groups of images to be processed;
setting template image information respectively corresponding to each group of images to be processed, wherein the template image information comprises width information and height information;
carrying out equal-scale amplification or reduction on all images to be processed in the same group until the width of the images to be processed is not larger than the width information of the template image and the height of the images to be processed is not larger than the height information of the template image;
and taking the template image as a frame, and carrying out configuration processing on the image to be processed after the equal-scale enlargement or reduction.
In order to achieve the above object, the present invention further provides a system for automatically reporting road violation based on voice and image processing, including:
the user intention acquisition module is used for processing the user voice through natural language processing and natural language generation to acquire the user intention;
the image acquisition module is used for acquiring an image to be reported in the automobile data recorder according to the user intention;
the classification information acquisition module is used for processing the image to be reported through a convolutional neural network to acquire classification information corresponding to the image to be reported;
the judgment result acquisition module is used for judging whether the image to be reported belongs to an illegal image or not by comparing the classification information corresponding to the image to be reported with a preset illegal standard image;
and the illegal reporting processing module is used for reporting the image to be reported if the image to be reported belongs to an illegal image.
In order to achieve the above object, the present invention further provides an electronic device, which includes a memory and a processor, wherein the memory includes an automatic road violation reporting program, and when executed by the processor, the automatic road violation reporting program implements the following steps:
processing the user voice through natural language processing and natural language generation to obtain the user intention;
acquiring an image to be reported in the automobile data recorder according to the user intention;
processing the image to be reported through a convolutional neural network to obtain classification information corresponding to the image to be reported;
judging whether the image to be reported belongs to an illegal image or not by comparing the classification information corresponding to the image to be reported with a preset illegal standard image;
and if the image to be reported belongs to the illegal image, reporting the image to be reported.
Preferably, the step of processing the acquired images in the automobile data recorder through the convolutional neural network to acquire classification information corresponding to the images includes:
preprocessing the acquired images in the automobile data recorder;
extracting image features of the preprocessed image through a convolutional neural network to obtain the features of each element image, and quantizing the features to obtain feature vectors;
acquiring target classification characteristic information according to the characteristic vectors, wherein the target classification characteristic information comprises a plurality of target categories and target characteristic vectors corresponding to each target category;
determining the classification corresponding to the image according to the element image and the target characteristics in the target classification characteristic information; and matching the element images with the target features in the target classification feature information to obtain the category information corresponding to each element image.
In addition, in order to achieve the above object, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a program for automatically reporting a road violation based on voice and image processing, and when the program for automatically reporting a road violation based on voice and image processing is executed by a processor, any step in the method for automatically reporting a road violation based on voice and image processing is implemented.
According to the method, the system, the electronic device and the computer-readable storage medium for automatically reporting the road violation based on the voice and image processing, the voice is processed through natural language processing and natural language generation, the image shot by the automobile data recorder is processed through the convolutional neural network, and the violation image is obtained, so that the problem that the existing road violation reporting process is complex and tedious and cannot meet the requirements of vehicle owners is solved.
Drawings
Fig. 1 is a schematic view of an application environment of a preferred embodiment of an automatic road violation reporting method based on voice and image processing according to the present invention;
FIG. 2 is a block diagram of an exemplary embodiment of an automatic road violation reporting system based on voice and image processing according to the present invention;
FIG. 3 is a flow chart of a preferred embodiment of the method for automatically reporting road violation based on voice and image processing according to the present invention;
fig. 4 is a detailed flowchart of the method for automatically reporting the road violation based on voice and image processing according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an automatic reporting method for road violation, which is applied to an electronic device 1. Fig. 1 is a schematic view of an application environment of a preferred embodiment of the method for automatically reporting a road violation according to the present invention.
In the present embodiment, the electronic device 1 may be a terminal device having an arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, or a desktop computer.
The electronic device 1 includes: a processor 12, a memory 11, a network interface 14, and a communication bus 15.
The memory 11 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory 11, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic apparatus 1, such as a hard disk of the electronic apparatus 1. In other embodiments, the readable storage medium may also be an external memory 11 of the electronic device 1, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1.
In the present embodiment, the readable storage medium of the memory 11 is generally used for storing an automatic road violation reporting program 10 installed in the electronic device 1, an APP (Application, in chinese, a third party Application of a mobile phone) corresponding to a two-dimensional code, and the like. The memory 11 may also be used to temporarily store data that has been output or is to be output.
The processor 12 may be a Central Processing Unit (CPU), a microprocessor or other data Processing chip in some embodiments, and is configured to run program codes stored in the memory 11 or process data, such as the automatic road violation reporting program 10.
The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication link between the electronic apparatus 1 and other electronic devices.
The communication bus 15 is used to realize connection communication between these components.
Fig. 1 only shows the electronic device 1 with components 11-15, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may alternatively be implemented.
Optionally, the electronic apparatus 1 may further include a call interface, the call interface may include an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone (microphone) or other devices with voice recognition function, a voice output device such as a sound, a headset, and the like, and optionally the call interface may further include a standard wired interface and a wireless interface.
Optionally, the electronic device 1 may further comprise a display, which may also be referred to as a display screen or a display unit. In some embodiments, the display device may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch device, or the like. The display is used for displaying information processed in the electronic apparatus 1 and for displaying a visual caller-side interface.
Optionally, the electronic device 1 further comprises a touch sensor. The area provided by the touch sensor and used for the calling terminal to perform touch operation is called a touch area. Further, the touch sensor described herein may be a resistive touch sensor, a capacitive touch sensor, or the like. The touch sensor may include not only a contact type touch sensor but also a proximity type touch sensor. Further, the touch sensor may be a single sensor, or may be a plurality of sensors arranged in an array, for example.
The area of the display of the electronic device 1 may be the same as or different from the area of the touch sensor. Optionally, a display is stacked with the touch sensor to form a touch display screen. The device detects touch operation triggered by a calling terminal based on a touch display screen.
Optionally, the electronic device 1 may further include a Radio Frequency (RF) circuit, a sensor, an audio circuit, and the like, which are not described herein again.
In the embodiment of the apparatus shown in fig. 1, the memory 11 as a computer storage medium may include an operating system and an automatic road violation reporting program 10 based on voice and image processing; the processor 12 executes the automatic road violation reporting program 10 based on voice and image processing stored in the memory 11 to implement the following steps:
processing the user voice through natural language processing and natural language generation to obtain the user intention;
acquiring an image to be reported in the automobile data recorder according to the user intention;
processing the image to be reported through a convolutional neural network to obtain classification information corresponding to the image to be reported;
judging whether the image to be reported belongs to an illegal image or not by comparing the classification information corresponding to the image to be reported with a preset illegal standard image;
and if the image to be reported belongs to the illegal image, reporting the image to be reported.
Preferably, the processing the user speech through natural language processing and natural language generation, and the step of obtaining the user intention includes:
processing the user voice through the natural language processing, and converting the user voice into a text;
and analyzing the converted text through the natural language generation, acquiring character information representing the user in the text, and replying the character information to the user in a voice form.
Preferably, the processing the user speech through the natural language processing, the converting the user speech into text includes:
preprocessing the user voice and then extracting features;
and carrying out mode matching on the extracted features and the voice signals in the voice model library to realize the conversion from the voice of the user to the text.
Preferably, the step of analyzing the converted text through the natural language generation to obtain the text information representing the user text information includes:
context understanding and semantic disambiguation are carried out on a plurality of phrases of a received text by combining a context by utilizing the built deep learning model, and semantic results of the phrases are obtained;
comparing the semantic results of the phrases with phrases of a knowledge graph respectively to obtain a similarity value of each phrase;
taking the phrase with the highest similarity value as the semantic result of each phrase, and further acquiring the semantic results of a plurality of phrases;
and combining the semantic results of the plurality of word groups to generate a semantic understanding result of the character information, and acquiring the character information intended by the user according to the semantic understanding result.
Preferably, the step of processing the acquired images in the automobile data recorder through the convolutional neural network to acquire classification information corresponding to the images includes:
preprocessing the acquired images in the automobile data recorder;
extracting image features of the preprocessed image through a convolutional neural network to obtain the features of each element image, and quantizing the features to obtain feature vectors;
acquiring target classification characteristic information according to the characteristic vectors, wherein the target classification characteristic information comprises a plurality of target categories and target characteristic vectors corresponding to each target category;
determining the classification corresponding to the image according to the element image and the target characteristics in the target classification characteristic information; and matching the element images with the target features in the target classification feature information to obtain the category information corresponding to each element image.
Preferably, the step of preprocessing the acquired images in the automobile data recorder comprises:
according to the length-width ratio information of the images to be processed, grouping the images to be processed and obtaining a plurality of groups of images to be processed;
setting template image information respectively corresponding to each group of images to be processed, wherein the template image information comprises width information and height information;
carrying out equal-scale amplification or reduction on all images to be processed in the same group until the width of the images to be processed is not larger than the width information of the template image and the height of the images to be processed is not larger than the height information of the template image;
and taking the template image as a frame, and carrying out configuration processing on the image to be processed after the equal-scale enlargement or reduction.
It is emphasized that, in order to further ensure the privacy and security of the predetermined violation standard image, the predetermined violation standard image may also be stored in a node of a block chain.
The electronic device 1 provided in the above embodiment processes the voice through natural language processing and natural language generation, and processes the image shot by the vehicle event data recorder through the convolutional neural network to obtain the violation image, thereby solving the problem that the existing road violation reporting process is complex and tedious and cannot meet the requirements of vehicle owners.
In other embodiments, the invention further provides an automatic road violation reporting system based on voice and image processing. Referring to fig. 2, an automatic road violation reporting system 100 includes: a user intention acquisition module 110, an image acquisition module 120, a classification information acquisition module 130, a judgment result acquisition module 140, and an illegal reporting processing module 150, wherein,
a user intention obtaining module 110, configured to process a user voice through natural language processing and natural language generation to obtain a user intention;
the image acquisition module 120 is configured to acquire an image to be reported in the automobile data recorder according to the user intention;
a classification information obtaining module 130, configured to process the image to be reported through a convolutional neural network, and obtain classification information corresponding to the image to be reported;
the judgment result obtaining module 140 is configured to judge whether the image to be reported belongs to an illegal image by comparing the classification information corresponding to the image to be reported with a preset illegal standard image;
and an illegal reporting processing module 150, configured to, if the image to be reported belongs to an illegal image, perform reporting processing on the image to be reported.
A user intent acquisition module 110 comprising: the system comprises a text conversion module and a character information acquisition module, wherein the text conversion module is used for processing the user voice through the natural language processing and converting the user voice into a text;
and the character information acquisition module is used for analyzing the converted text through the natural language generation, acquiring character information representing the user in the text and replying the character information to the user in a voice form.
The classification information acquisition module 130 includes: an image preprocessing module, a characteristic vector acquiring module, a target classification characteristic information acquiring module and an image classification determining module, wherein,
the image preprocessing module is used for preprocessing the acquired images in the automobile data recorder;
the characteristic vector acquisition module is used for extracting image characteristics of the preprocessed image through a convolutional neural network to obtain the characteristics of each element image and quantizing the characteristics to obtain characteristic vectors;
the target classification characteristic information acquisition module is used for acquiring target classification characteristic information according to the characteristic vectors, wherein the target classification characteristic information comprises a plurality of target classes and target characteristic vectors corresponding to each target class;
the image classification determining module is used for determining the classification corresponding to the image according to the element image and the target feature in the target classification feature information; and matching the element images with the target features in the target classification feature information to obtain the category information corresponding to each element image.
In addition, the invention also provides a method for automatically reporting the road violation based on voice and image processing. Fig. 3 is a flowchart illustrating an exemplary embodiment of a method for automatically reporting road violations based on voice and image processing according to the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the method for automatically reporting a road violation based on voice and image processing includes: step S110-step S150.
S110: processing the user voice through natural language processing and natural language generation to obtain the user intention;
s120: acquiring an image to be reported in the automobile data recorder according to the user intention;
s130: processing the image to be reported through a convolutional neural network to obtain classification information corresponding to the image to be reported;
s140: judging whether the image to be reported belongs to an illegal image or not by comparing the classification information corresponding to the image to be reported with a preset illegal standard image;
s150: and if the image to be reported belongs to the illegal image, reporting the image to be reported.
In this embodiment, the method for automatically reporting the road violation based on voice and image processing uses an intelligent voice assistant in combination with an automatic reporting of the violation scene by a driving recorder, and by adopting the automatic one-key reporting mode of the violation scene, the complex process of reporting can be reduced, and the probability of autonomous reporting by a vehicle owner is further increased.
In step S110, if the driver finds a road violation, the driver invokes a voice assistant in the vehicle, and the voice assistant processes the received voice through natural language understanding and natural voice generation and responds to the vehicle owner.
Among other things, the intelligent voice assistant is unable to keep away with Natural Language Processing (NLP) and Natural Language Generation (NLG). When the intelligent voice assistant is sent a message, it will pick it up and use NLP, which converts the speech to text to determine what the user is saying. An NLG is a system that generates natural language using artificial intelligence and computational linguistics, and can also translate text into speech by listening to what a user speaks, breaking it down into small units, and analyzing it to generate output or information in the form of text. The NLP system first determines the information to be translated into text, then organizes the expression structure, and using a set of grammar rules, the NLG can systematically form and read out the complete sentence.
In an embodiment of the present invention, the processing the user speech through natural language processing and natural language generation to obtain the user intention includes:
step S111: processing the user voice through the natural language processing, and converting the user voice into a text;
step S112: and analyzing the converted text through the natural language generation to acquire the character information of the user in the text.
And then, the obtained text represents the user letter and is replied to the user in a voice form.
Wherein the processing the user speech through the natural language processing, the converting the user speech into a text step includes:
the first step is as follows: preprocessing the user voice; the preprocessing is to process the image elements into image elements which can be conveniently subjected to feature extraction;
the second step is that: extracting the characteristics of the preprocessed language;
the third step: and carrying out mode matching on the extracted features and the voice signals in the voice model library to realize the conversion of the user voice to the text.
Wherein, the step of analyzing the converted text through the natural language generation to obtain the text information representing the user comprises:
the first step is as follows: context understanding and semantic disambiguation are carried out on a plurality of phrases of a received text by combining a context by utilizing the built deep learning model, and semantic results of the phrases are obtained;
the second step is that: comparing the semantic results of the phrases with phrases of a knowledge graph respectively to obtain a similarity value of each phrase;
the third step: taking the phrase with the highest similarity value as the semantic result of each phrase, and further acquiring the semantic results of a plurality of phrases;
the fourth step: and combining the semantic results of the plurality of word groups to generate a semantic understanding result of the text information and acquire the text information intended by the user.
In a specific embodiment of the invention, a vehicle owner finds that a vehicle ahead does not drive a turning lamp to change the lane during driving, a voice assistant is called by 'hello, xx', and after the voice assistant successfully calls and responds, the vehicle owner initiates the next command: "report violating the regulations", the speech assistant converts the voice into the text through NLP technology, decompose into the cell (decompose the text into the machine language, the computer voice, then analyze it), and analyze it and produce the corresponding reply, carry on the natural language composition and pronunciation through NLG and read and return to the car owner, for example: and (4) whether the violation is reported or not is confirmed, and the next operation is carried out after the violation is confirmed.
In step S120, if the request is an illegal reporting request, the automobile data recorder is called to acquire the video resources before the request for 30S and after the request for 10S. The voice assistant is connected with the automobile data recorder through the API data interface, when a user sends an illegal reporting request to the voice assistant, the voice assistant recognizes the intention of the illegal reporting request of the user, and the intention of the user is directly transmitted to the automobile data recorder through the API data interface due to the fact that the voice assistant is connected with the automobile data recorder through the API interface, so that the user directly calls video resources of the automobile data recorder through the voice assistant in a voice mode.
In an embodiment of the invention, an image of a video is captured by sampling at fixed time intervals, and then the captured image is analyzed. The API is a unified interface provided for the outside, the pictures are classified, analyzed and cut through the algorithm layer and the engine layer after being transmitted through the API, and finally the result is output through the API, and the resource management layer and the hardware basic layer are resources and hardware provided for the picture analysis function.
In step S130, in the process of analyzing the image, the features of each elemental image are extracted through a convolutional neural network, which is an image processing neural network that is used to extract the features of the elemental image, such as shape and color, and express the features quantitatively through feature vectors, the target classification feature information includes a plurality of target classes and target feature vectors corresponding to each target class, a matching degree calculation formula can be used to calculate a matching degree between the elemental image and each target class in the target classification feature information, and the class information corresponding to each elemental image can be obtained according to the calculated matching degree.
The step of processing the acquired images in the automobile data recorder through the convolutional neural network to acquire classification information corresponding to the images comprises the following steps of:
s131: preprocessing the acquired images in the automobile data recorder;
s132: extracting image features of the preprocessed image through a convolutional neural network to obtain the features of each element image, and quantizing the features to obtain feature vectors;
s133: acquiring target classification characteristic information according to the characteristic vectors, wherein the target classification characteristic information comprises a plurality of target categories and target characteristic vectors corresponding to each target category;
s134: determining the classification corresponding to the image according to the element image and the target characteristics in the target classification characteristic information; and matching the element images with the target features in the target classification feature information to obtain the category information corresponding to each element image.
Specifically, the step of preprocessing the acquired images in the automobile data recorder comprises:
the first step is as follows: according to the length-width ratio information of the images to be processed, grouping the images to be processed and obtaining a plurality of groups of images to be processed;
the second step is that: setting template image information respectively corresponding to each group of images to be processed, wherein the template image information comprises width information and height information;
the third step: carrying out equal-scale amplification or reduction on all images to be processed in the same group until the width of the images to be processed is not larger than the width information of the template image and the height of the images to be processed is not larger than the height information of the template image;
the fourth step: and taking the template image as a frame, and carrying out configuration processing on the image to be processed after the equal-scale enlargement or reduction.
The method comprises the steps of obtaining picture characteristics of a picture to be processed, wherein the picture characteristics are picture definition or contrast, but are not limited to the picture definition or contrast and the like; and regrouping the multiple groups of pictures to be processed according to the picture characteristics.
The step of setting the template picture information corresponding to each group of pictures to be processed includes: s1: reading the height and width information of all the pictures to be processed in the same group; s2: comparing the width information of each picture to be processed to obtain a maximum width value; meanwhile, comparing the height information of each picture to be processed to obtain the maximum height value. S3: and setting a template picture according to the maximum width value and the maximum height value, so that the height of the template picture is the maximum height value, and the width of the template picture is the maximum width value.
In steps S132 to S134, a set of images each labeled as a single category is given, a new set of categories of test images is predicted, and the accuracy result of the prediction is measured, which is an image classification problem; the image classification algorithm may be decomposed according to the following steps:
(1) the input is a training set of N images, for a total of K classes, each image being labeled as one of the classes. In an embodiment of the present invention, N pictures of vehicle violations may be input and marked, such as a picture of solid line driving and marked as a solid line violation. As above, enough illegal pictures of the scene are input and marked.
(2) A classifier is then trained using the training set to learn the appearance of each class. Inputting the feature vector into a softmax classifier to obtain a classification result of the image,
wherein the softmax classifier is a classifier which is completed with classification training.
(3) Finally, class labels of a group of new images are predicted, and the performance of the classifier is evaluated according to whether the predicted result of the classifier is correct or not.
The process of creating the model is to generate the model through a large amount of data labels by training, and the model can be automatically learned through learning ability, and when a new problem is encountered, the optimal solution is given by calling the model.
In step S140, whether the acquired image belongs to the violation image is determined by comparing the classification information corresponding to the acquired image with the violation standard image.
Generally, the database includes standard samples of various illegal images, and the classification information of the acquired images is compared with the standard samples one by one to determine whether the images belong to the illegal images.
In step S150, the illegal picture is determined to be reported, the illegal scene is automatically reported, the intelligent voice assistant automatically calls the public security system to perform the operation of uploading and reporting the illegal video, and feeds back to the vehicle owner: reporting success or reporting failure.
To further explain the reporting method of the present invention, fig. 4 shows a detailed process of automatic reporting of road violation based on voice and image processing, as shown in fig. 4, S41-S42: the vehicle owner calls the voice assistant;
s43: the voice assistant performs voice processing through NLP;
s44: processing an illegal reporting instruction;
s45: processing a non-violation reporting instruction;
s46: calling a vehicle event data recorder to obtain a video;
s47: matching video processing pictures;
s48: judging whether an illegal behavior exists or not;
s49: if yes, automatically reporting to a public security system;
s50: returning a reporting result;
s51: and (6) ending.
According to the method for automatically reporting the road violation based on the voice and image processing, the voice is processed through the natural language processing and the natural language generation, the image shot by the automobile data recorder is processed through the convolutional neural network, and the violation image is obtained, so that the problem that the existing road violation reporting process is complex and tedious and cannot meet the requirements of vehicle owners is solved.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a voice and image processing-based automatic road violation reporting program, and when the voice and image processing-based automatic road violation reporting program is executed by a processor, the voice and image processing-based automatic road violation reporting program is executed by the processor to implement the following operations:
processing the user voice through natural language processing and natural language generation to obtain the user intention;
acquiring an image to be reported in the automobile data recorder according to the user intention;
processing the image to be reported through a convolutional neural network to obtain classification information corresponding to the image to be reported;
judging whether the image to be reported belongs to an illegal image or not by comparing the classification information corresponding to the image to be reported with a preset illegal standard image;
and if the image to be reported belongs to the illegal image, reporting the image to be reported.
The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the specific implementation of the above-mentioned method for automatically reporting road violation based on voice and image processing and the electronic device, and is not repeated herein.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A road violation automatic reporting method based on voice and image processing is applied to an electronic device and comprises the following steps:
processing the user voice through natural language processing and natural language generation to obtain the user intention;
acquiring an image to be reported in the automobile data recorder according to the user intention;
processing the image to be reported through a convolutional neural network to obtain classification information corresponding to the image to be reported;
judging whether the image to be reported belongs to an illegal image or not by comparing the classification information corresponding to the image to be reported with a preset illegal standard image;
and if the image to be reported belongs to the illegal image, reporting the image to be reported.
2. The method for automatically reporting road violation based on voice and image processing as claimed in claim 1,
the step of processing the user voice through natural language processing and natural language generation to obtain the user intention comprises the following steps:
processing the user voice through the natural language processing, and converting the user voice into a text;
and analyzing the converted text through the natural language generation to acquire the character information of the user in the text.
3. The method for automatically reporting road violation based on voice and image processing as claimed in claim 2, wherein,
the step of processing the user speech through the natural language processing, and converting the user speech into text, includes:
preprocessing the user voice and then extracting features;
and carrying out mode matching on the extracted features and the voice signals in the voice model library to realize the conversion from the voice of the user to the text.
4. The method for automatically reporting road violation based on voice and image processing as claimed in claim 2, wherein,
the step of analyzing the converted text through the natural language generation to obtain the text information representing the user character information includes:
context understanding and semantic disambiguation are carried out on a plurality of phrases of a received text by combining a context by utilizing the built deep learning model, and semantic results of the phrases are obtained;
comparing the semantic results of the phrases with phrases of a knowledge graph respectively to obtain a similarity value of each phrase;
taking the phrase with the highest similarity value as the semantic result of each phrase, and further acquiring the semantic results of a plurality of phrases;
and combining the semantic results of the plurality of word groups to generate a semantic understanding result of the character information, and acquiring the character information intended by the user according to the semantic understanding result.
5. The method for automatically reporting road violation based on voice and image processing as claimed in claim 1,
the step of processing the acquired images in the automobile data recorder through the convolutional neural network to acquire the classification information corresponding to the images comprises the following steps of:
preprocessing the acquired images in the automobile data recorder;
extracting image features of the preprocessed image through a convolutional neural network to obtain the features of each element image, and quantizing the features to obtain feature vectors;
acquiring target classification characteristic information according to the characteristic vectors, wherein the target classification characteristic information comprises a plurality of target categories and target characteristic vectors corresponding to each target category;
determining the classification corresponding to the image according to the element image and the target characteristics in the target classification characteristic information; and matching the element images with the target features in the target classification feature information to obtain the category information corresponding to each element image.
6. The method for automatically reporting road violation based on voice and image processing as claimed in claim 5, wherein the predetermined violation standard image is stored in a block chain,
the step of preprocessing the acquired images in the automobile data recorder comprises the following steps:
according to the length-width ratio information of the images to be processed, grouping the images to be processed and obtaining a plurality of groups of images to be processed;
setting template image information respectively corresponding to each group of images to be processed, wherein the template image information comprises width information and height information;
carrying out equal-scale amplification or reduction on all images to be processed in the same group until the width of the images to be processed is not larger than the width information of the template image and the height of the images to be processed is not larger than the height information of the template image;
and taking the template image as a frame, and carrying out configuration processing on the image to be processed after the equal-scale enlargement or reduction.
7. A road violation automatic reporting system based on voice and image processing is characterized by comprising:
the user intention acquisition module is used for processing the user voice through natural language processing and natural language generation to acquire the user intention;
the image acquisition module is used for acquiring an image to be reported in the automobile data recorder according to the user intention;
the classification information acquisition module is used for processing the image to be reported through a convolutional neural network to acquire classification information corresponding to the image to be reported;
the judgment result acquisition module is used for judging whether the image to be reported belongs to an illegal image or not by comparing the classification information corresponding to the image to be reported with a preset illegal standard image;
and the illegal reporting processing module is used for reporting the image to be reported if the image to be reported belongs to an illegal image.
8. An electronic device, comprising: the device comprises a memory and a processor, wherein the memory comprises a voice and image processing-based road violation automatic reporting program, and the voice and image processing-based road violation automatic reporting program is executed by the processor to realize the following steps:
processing the user voice through natural language processing and natural language generation to obtain the user intention;
acquiring an image to be reported in the automobile data recorder according to the user intention;
processing the image to be reported through a convolutional neural network to obtain classification information corresponding to the image to be reported;
judging whether the image to be reported belongs to an illegal image or not by comparing the classification information corresponding to the image to be reported with a preset illegal standard image;
and if the image to be reported belongs to the illegal image, reporting the image to be reported.
9. The electronic device of claim 8,
the step of processing the acquired images in the automobile data recorder through the convolutional neural network to acquire the classification information corresponding to the images comprises the following steps of:
preprocessing the acquired images in the automobile data recorder;
extracting image features of the preprocessed image through a convolutional neural network to obtain the features of each element image, and quantizing the features to obtain feature vectors;
acquiring target classification characteristic information according to the characteristic vectors, wherein the target classification characteristic information comprises a plurality of target categories and target characteristic vectors corresponding to each target category;
determining the classification corresponding to the image according to the element image and the target characteristics in the target classification characteristic information; and matching the element images with the target features in the target classification feature information to obtain the category information corresponding to each element image.
10. A computer-readable storage medium, wherein the computer-readable storage medium includes a speech and image processing-based automatic road violation reporting program, and when the speech and image processing-based automatic road violation reporting program is executed by a processor, the steps of the speech and image processing-based automatic road violation reporting method according to any one of claims 1 to 6 are implemented.
CN202010344544.5A 2020-04-27 2020-04-27 Method and system for automatically reporting road violation based on voice and image processing Pending CN111613225A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010344544.5A CN111613225A (en) 2020-04-27 2020-04-27 Method and system for automatically reporting road violation based on voice and image processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010344544.5A CN111613225A (en) 2020-04-27 2020-04-27 Method and system for automatically reporting road violation based on voice and image processing

Publications (1)

Publication Number Publication Date
CN111613225A true CN111613225A (en) 2020-09-01

Family

ID=72201192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010344544.5A Pending CN111613225A (en) 2020-04-27 2020-04-27 Method and system for automatically reporting road violation based on voice and image processing

Country Status (1)

Country Link
CN (1) CN111613225A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694372A (en) * 2020-12-31 2022-07-01 宝能汽车集团有限公司 Active identification method for vehicle violation, vehicle-mounted multimedia and active identification system for vehicle violation

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105632183A (en) * 2016-01-27 2016-06-01 福建工程学院 Vehicle violation behavior proof method and system thereof
CN106295541A (en) * 2016-08-03 2017-01-04 乐视控股(北京)有限公司 Vehicle type recognition method and system
CN107491764A (en) * 2017-08-25 2017-12-19 电子科技大学 A kind of violation based on depth convolutional neural networks drives detection method
CN107808132A (en) * 2017-10-23 2018-03-16 重庆邮电大学 A kind of scene image classification method for merging topic model
CN208000676U (en) * 2018-04-12 2018-10-23 南京信息工程大学 A kind of online vehicular traffic prosecution system violating the regulations
CN109166284A (en) * 2018-09-11 2019-01-08 广东省电子技术研究所 A kind of unlawful practice alarm system and unlawful practice alarm method
US20190220692A1 (en) * 2017-07-24 2019-07-18 Yi Tunnel (Beijing) Technology Co., Ltd. Method and apparatus for checkout based on image identification technique of convolutional neural network
CN110046547A (en) * 2019-03-06 2019-07-23 深圳市麦谷科技有限公司 Report method, system, computer equipment and storage medium violating the regulations
CN110335595A (en) * 2019-06-06 2019-10-15 平安科技(深圳)有限公司 Slotting based on speech recognition asks dialogue method, device and storage medium
CN110415529A (en) * 2019-09-04 2019-11-05 上海眼控科技股份有限公司 Automatic processing method, device, computer equipment and the storage medium of vehicle violation
CN110533912A (en) * 2019-09-16 2019-12-03 腾讯科技(深圳)有限公司 Driving behavior detection method and device based on block chain

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105632183A (en) * 2016-01-27 2016-06-01 福建工程学院 Vehicle violation behavior proof method and system thereof
CN106295541A (en) * 2016-08-03 2017-01-04 乐视控股(北京)有限公司 Vehicle type recognition method and system
US20190220692A1 (en) * 2017-07-24 2019-07-18 Yi Tunnel (Beijing) Technology Co., Ltd. Method and apparatus for checkout based on image identification technique of convolutional neural network
CN107491764A (en) * 2017-08-25 2017-12-19 电子科技大学 A kind of violation based on depth convolutional neural networks drives detection method
CN107808132A (en) * 2017-10-23 2018-03-16 重庆邮电大学 A kind of scene image classification method for merging topic model
CN208000676U (en) * 2018-04-12 2018-10-23 南京信息工程大学 A kind of online vehicular traffic prosecution system violating the regulations
CN109166284A (en) * 2018-09-11 2019-01-08 广东省电子技术研究所 A kind of unlawful practice alarm system and unlawful practice alarm method
CN110046547A (en) * 2019-03-06 2019-07-23 深圳市麦谷科技有限公司 Report method, system, computer equipment and storage medium violating the regulations
CN110335595A (en) * 2019-06-06 2019-10-15 平安科技(深圳)有限公司 Slotting based on speech recognition asks dialogue method, device and storage medium
CN110415529A (en) * 2019-09-04 2019-11-05 上海眼控科技股份有限公司 Automatic processing method, device, computer equipment and the storage medium of vehicle violation
CN110533912A (en) * 2019-09-16 2019-12-03 腾讯科技(深圳)有限公司 Driving behavior detection method and device based on block chain

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴玉枝 等: "基于卷积神经网络的违章停车事件检测", 现代计算机, pages 22 - 26 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694372A (en) * 2020-12-31 2022-07-01 宝能汽车集团有限公司 Active identification method for vehicle violation, vehicle-mounted multimedia and active identification system for vehicle violation

Similar Documents

Publication Publication Date Title
CN110033018B (en) Graph similarity judging method and device and computer readable storage medium
CN107944450B (en) License plate recognition method and device
WO2020005731A1 (en) Text entity detection and recognition from images
CN112329659A (en) Weak supervision semantic segmentation method based on vehicle image and related equipment thereof
CN111695439A (en) Image structured data extraction method, electronic device and storage medium
CN110807314A (en) Text emotion analysis model training method, device and equipment and readable storage medium
CN110058838B (en) Voice control method, device, computer readable storage medium and computer equipment
WO2024041479A1 (en) Data processing method and apparatus
CN112528894A (en) Method and device for distinguishing difference items
CN113139403A (en) Violation behavior identification method and device, computer equipment and storage medium
CN111695604A (en) Image reliability determination method and device, electronic equipment and storage medium
CN111914076A (en) User image construction method, system, terminal and storage medium based on man-machine conversation
CN111783471A (en) Semantic recognition method, device, equipment and storage medium of natural language
CN110428816B (en) Method and device for training and sharing voice cell bank
CN110472655B (en) Marker machine learning identification system and method for cross-border travel
CN114677650A (en) Intelligent analysis method and device for pedestrian illegal behaviors of subway passengers
Gunawan et al. Performance Evaluation of Automatic Number Plate Recognition on Android Smartphone Platform.
CN111613225A (en) Method and system for automatically reporting road violation based on voice and image processing
CN111062388B (en) Advertisement character recognition method, system, medium and equipment based on deep learning
CN112926700A (en) Class identification method and device for target image
CN108897739B (en) Intelligent automatic mining method and system for application flow identification characteristics
US20230237816A1 (en) Adaptive text recognition
CN115578736A (en) Certificate information extraction method, device, storage medium and equipment
CN115984886A (en) Table information extraction method, device, equipment and storage medium
CN116010545A (en) Data processing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination