CN107346629A

CN107346629A - A kind of intelligent blind reading method and intelligent blind reader system

Info

Publication number: CN107346629A
Application number: CN201710739516.1A
Authority: CN
Inventors: 刘宇红; 蒋明怀; 张荣芬; 张达峰
Original assignee: Guizhou University
Current assignee: Guizhou University
Priority date: 2017-08-22
Filing date: 2017-08-22
Publication date: 2017-11-14

Abstract

The invention discloses a kind of intelligent blind reading method and intelligent blind reader system, and the image information of text is gathered by image capture module, the image information of text is uploaded into Cloud Server by communication module by USB diverter modules；The image information of Cloud Server collection is pre-processed, Text segmentation, Text region, and recognition result is sent into USB diverter modules with text formatting；USB diverter modules are retransmited to central control module, the recognition result of text formatting is changed into voice messaging by central control module by the Audio Processing Unit of voice broadcast module, bluetooth unit is sent, by the earphone or loudspeaker of bluetooth unit, by voice broadcast to reader.Compared with prior art, there is the identification for realizing the eight kinds of fonts commonly used in printed Chinese character, realize that handwritten Chinese character body identifies, reading efficiency is high, reads the advantages of accuracy rate is high.

Description

A kind of intelligent blind reading method and intelligent blind reader system

Technical field

The present invention relates to a kind of reader system, particularly a kind of intelligent blind reader recognition methods and intelligent blind are read Read device system.

Background technology

No. 282 commentary updated according to the World Health Organization (WHO) in August, 2014, in the range of the world today Visually impaired number is about 2.85 hundred million, wherein 39,000,000 people suffer from blind disease, 2.46 hundred million people are amblyopia, and developing country regards Power sufferer's number accounts for global 90%.In addition, China carried out the Second China National Sample Survey on Disability in 2006, Calculated according to investigation result, the sum of all kinds of disabled persons in China in 2006 is 82,960,000 people, wherein 12,330,000 people with visual disabilities.Root The whole nation is accounted for according to the 6th national census China total number of people, and the Second China National Sample Survey on Disability China disabled person always The ratio of population and all kinds of disabled persons account for the ratio of disabled person's total number of persons, calculate 2010 year end China disabled person total numbers of persons 85,020,000 People, wherein number with visual disabilities is about 12,630,000 people.

As can be seen that having, the crowd of vision disorder is very huge, and as the improvement of people's living standards, reading increasingly As a kind of strong demand, while with the development of modern science and technology, our life enters an information content increasing, information The circulation way diversified New Times, this is the epoch for being properly termed as " Information barrier-free ", exactly makes no matter abled person goes back It is disabled person, no matter young man or the elderly can benefit from information technology, and anyone can put down under any circumstance Etc. ground, easily, obtain information without barrier, utilize information." Information barrier-free " is also known as " information accessibility ", and its target is will Make whole society all people, including physical disabilities, the elderly and children, the chance with impartial acquisition or use information.It is modern The achievement of scientific and technological development of civilization, it should benefit everyone.Blind person is a urgent need colony to be aided among these, and they also have The serious hope urgent to external information.But the mode of blind person's study now is substantially armrest and refers to touch braille to be read a book, Not only reading efficiency is low for this mode, and the accuracy of reading content is not also high, and this study to blind person causes very great Divide difficulty.Existing intelligent blind reader can not preferably realize eight kinds of fonts (Song typeface, the patterns commonly used in printed Chinese character Body, lishu, black matrix, children's circle, Chinese-language row pattern, the new Wei of Chinese, the easypro body of Chinese) identification and realize that handwritten Chinese character body identifies.

Therefore, existing reading aid for blindmen deposit can not realize in printed Chinese character commonly use the Song typeface, regular script, lishu, black matrix, The new Wei of children's circle, Chinese-language row pattern, Chinese, the identification of the easypro body of Chinese, can not realize that handwritten Chinese character body identifies, reading efficiency is low, reads The shortcomings that accuracy rate is low.

The content of the invention

It is an object of the present invention to provide a kind of intelligent blind reader system, the present invention, which has, realizes the block letter Chinese The identification of the Song typeface, regular script, lishu, black matrix, children's circle, Chinese-language row pattern, the new Wei of Chinese, the easypro body of Chinese commonly used in word, is realized hand-written Chinese Character Font Recognition, reading efficiency is high, reads the advantages of accuracy rate is high.

Technical scheme：A kind of intelligent blind reading method, the image of text is gathered by image capture module Information, the image information of text is uploaded to by Cloud Server by communication module by USB diverter modules；The figure of Cloud Server collection As information pre-processed, Text segmentation, Text region, and recognition result is sent to USB diverter modules with text formatting； USB diverter modules retransmit will by the Audio Processing Unit of voice broadcast module to central control module, central control module The recognition result of text formatting changes into voice messaging, send bluetooth unit, by the earphone or loudspeaker of bluetooth unit, by voice Report to reader.

Foregoing intelligent blind reading method, the Cloud Server be by image analysis system complete picture pre-process, Text segmentation and Text region.

Foregoing intelligent blind reading method, described Text region, it is to be completed using word training pattern, uses depth Training pattern of the convolutional neural networks as single character recognition in study.

The intelligent blind reader system of foregoing intelligent blind reading method, including central control module, center control Molding block is connected to voice broadcast module and USB diverter modules, and USB diverter modules are connected by communication module and Cloud Server Connect, USB diverter modules are also associated with image capture module.

Foregoing intelligent blind reader system, the voice broadcast module include Audio Processing Unit, speech processes list Member is connected with bluetooth unit.

Foregoing intelligent blind reader system, the phonetic synthesis unit include serial ports transmitting-receiving, phonetic synthesis and voice Outlet line, the bluetooth unit include power amplifier and loudspeaker；The input and central control module of serial ports transmitting-receiving UART3 connections, the output end of serial ports transmitting-receiving by phonetic synthesis and voice output connection, the output of voice output circuit according to It is secondary to be connected by the power amplifier of bluetooth unit with loudspeaker.

Foregoing intelligent blind reader system, the central control module include main control chip, and main control chip passes through slow Deposit, processing and control element (PCE) is connected with UART3 and UART2 respectively, UART2 is connected with reserved serial ports, and caching, processing and control element (PCE) are also USB.HOST 2.0 is connected with, USB.HOST 2.0 is connected with the upstream port of USB diverter modules.

Foregoing intelligent blind reader system, the USB diverter modules include multiport transmitting-receiving control unit, multiport Transmitting-receiving control unit is connected to upstream port, exchanges, changes, caching, processing, USB1, USB3 and USB2, USB1 and image The spare interface of the camera connection of acquisition module, USB3 and image capture module connects, the USB interface of USB2 and communication module Connection.

Foregoing intelligent blind reader system, the communication module include 4G communication chips, and 4G communication chips connect respectively Radio-frequency antenna, SIM card and USB interface are connected to, radio-frequency antenna is connected with Cloud Server.

Beneficial effects of the present invention：Compared with prior art, present invention incorporates machine vision, Digital Image Processing, depth Spend study, the cutting edge technology such as computer network, can not only realize the identification of printed Chinese character, but also can realize the Song typeface, Eight kinds of Character Font Recognitions such as regular script, lishu, black matrix.Further, it is also possible to realize the identification of handwritten Chinese character body.Traditional blind person is broken Read books need with finger touch blind person limitation, have realize in printed Chinese character commonly use the Song typeface, regular script, lishu, The identification of black matrix, children's circle, Chinese-language row pattern, the new Wei of Chinese, the easypro body of Chinese, realizes that handwritten Chinese character body identifies, reading efficiency is high, reads The advantages of accuracy rate is high.

Brief description of the drawings

Fig. 1 is the theory diagram of present system；

Fig. 2 is the hardcore structured flowchart in Fig. 1；

Fig. 3 is the fundamental diagram of voice broadcast module in Fig. 1；

Fig. 4 is the fundamental diagram of communication module in Fig. 1；

Fig. 5 is the fundamental diagram of USB diverter modules in Fig. 1；

Fig. 6 is the operational flow diagram of the present invention；

Fig. 7 is the word graph of the corresponding generation of various fonts；

Fig. 8 is CNN implementation model simplification figures；

Fig. 9 is verbal model training network illustraton of model；

Figure 10 is SoftmaxWithLoss layer schematic diagrames；

Figure 11 is the graph of a relation of accuracy and iterations.

Embodiment

The present invention is further illustrated with reference to the accompanying drawings and examples, but be not intended as to the present invention limit according to According to.

Embodiment.A kind of intelligent blind reading method and intelligent blind reader system, form as shown in Fig. 1~11, one Kind intelligent blind reading method, the image information of text is gathered by image capture module, passes through the mould that communicates by USB diverter modules The image information of text is uploaded to Cloud Server by block；The image information of Cloud Server collection is pre-processed, Text segmentation, text Word is identified, and recognition result is sent into USB diverter modules with text formatting；USB diverter modules retransmit to center and control mould Block, the recognition result of text formatting is changed into voice by central control module by the Audio Processing Unit of voice broadcast module to be believed Breath, send bluetooth unit, by the earphone or loudspeaker of bluetooth unit, by voice broadcast to reader.

The Cloud Server is to complete picture pretreatment, Text segmentation and Text region by image analysis system.

Described Text region, it is to be completed using word training pattern, is made using the convolutional neural networks in deep learning For the training pattern of single character recognition.

The intelligent blind reader system of described intelligent blind reading method, including central control module, center control Molding block is connected to voice broadcast module and USB diverter modules, and USB diverter modules are connected by communication module and Cloud Server Connect, USB diverter modules are also associated with image capture module.

The voice broadcast module includes Audio Processing Unit, and Audio Processing Unit is connected with bluetooth unit.

The phonetic synthesis unit, which includes serial ports transmitting-receiving, phonetic synthesis and voice output circuit, the bluetooth unit, to be included Power amplifier and loudspeaker；The UART3 connections of the input and central control module of serial ports transmitting-receiving, the output end of serial ports transmitting-receiving are led to Cross phonetic synthesis and voice output connection, the output of voice output circuit pass sequentially through bluetooth unit power amplifier and Loudspeaker connect.

The central control module includes main control chip, main control chip by caching, processing and control element (PCE) respectively with UART3 Being connected with UART2, UART2 is connected with reserved serial ports, and caching, processing and control element (PCE) are also associated with USB.HOST 2.0, USB.HOST 2.0 is connected with the upstream port of USB diverter modules.

The USB diverter modules include multiport transmitting-receiving control unit, and multiport transmitting-receiving control unit is connected to Port to be swum, exchanges, change, cache, handle, USB1, USB3 and USB2, the camera of USB1 and image capture module connects, The spare interface of USB3 and image capture module connects, and the USB interface of USB2 and communication module connects.

The communication module includes 4G communication chips, and 4G communication chips are connected to radio-frequency antenna, SIM card and USB and connect Mouthful, radio-frequency antenna is connected with Cloud Server.

The Software for Design part of the present invention mainly includes the programming on Cloud Server and the journey of reading aid for blindmen front end Sequence designs.Software for Design on Cloud Server be mainly the image of front-end collection is pre-processed, Text segmentation, finally utilize The model trained completes the identification function of word, finally retransmits recognition result to front end main control chip.Blind person reads The programming of device front end mainly text image collection, transmission and recognition result of the communication module to image including camera The programs of functional module such as receiving, the configuration of report and keys interrupt of the voice broadcast module to recognition result are write.

Essence of the invention is that the character image that image capture module collects is identified.The key of Text region Technology is to pre-process the image of collection, three links of Text segmentation and Text region, and a link is to image procossing Effect is bad will to influence final recognition effect.

Picture pre-processes：

Due to needing that picture character is cut, so the gray value of the row of picture and line space (row and column pitch) should The diversity ratio is larger, and noise is as far as possible small, is also exactly not have overlapping between row and row (row and arrange), whole word will not There is inclination and distort.Therefore following 5 aspects are mainly included in terms of picture preconditioning technique：Gray processing, denoising, tilt school Just with rim detection and Contrast enhanced.

Gray processing：

Image binaryzation, also referred to as gray processing, obvious black and white effect is exactly showed on image.As above us Introduced, normal coloured image is a kind of spatial model with RGB3 passages, and for a pixel, we need simultaneously Represent that black just can determine that a color to RGB triple channels assignment such as (0,0,0), and we are difficult to determine for a color What RGB is respectively, and the digital operation of triple channel causes the treatment effeciency in pixel scale than relatively low, and if gray scale Scheme this single channel image, we just readily can determine its pixel value for a kind of color.

By binary conversion treatment, we can be converted to picture in gray scale picture, so not only make it that treatment effeciency is higher, And unique color pixel values are can determine so that the gray value differences of literal line and spaced rows are away from obvious.

After treatment, it is evident that it can be seen that word picture only has greyish black two kinds of colors, basically reached row with the ranks It is obvious away from grey value difference before (row and column pitch).

Denoising：

Digital picture in reality is often made an uproar during transmission and digitlization by external environment condition and imaging device Acoustic jamming etc. influences, referred to as noisy image or noise image.This process for reducing noise in digital picture is referred to as image Make an uproar.Denoising can effectively improve the recognition effect of picture, and the method for denoising has a lot, and more commonly used mainly has mean filter Device, adaptive wiener filter, median filter, Wavelet Denoising Method etc..

Denoising of the present invention is a kind of nonlinear smoothing technology using mainly median filtering method, median filtering method, and it will be every The gray value of one pixel is arranged to the intermediate value of all pixels point gray value in the point neighborhood window.

Slant correction：

Take pictures or it is other acquisition picture during, can more or less cause the part of picture character to tilt, in order to The degree of accuracy of the word by row segmentation is improved, entering line tilt correction plays the role of important, and how the key of picture Slant Rectify is Detect the inclined direction of character and angle, and currently used slant correction algorithm mainly have it is following several：Conventional linear The method of detection, the method based on projection, the method based on Hough transform and progress Fourier transformation are transformed into frequency domain progress Analysis detection etc., by studying and testing this kind of characters such as word, numeral, letter of comprising only found for OCR identifications Image is optimal using the method effect of Hough transform.

Hough takes a series of processing step such as highest tomographic image and then extraction image border by generating image pyramid Then the angle of inclination for detecting image that can be more accurate and direction carry out rotation correction, test result shows this algorithm essence Exactness highest.

Contrast enhanced：

For the average image of grey value profile, its visual effect is better than the image of other distributions, gray value Uneven embodiment be background colour may gray value between a very big scope, and character color equally also has very big model Enclose, for this phenomenon, we can be handled so as to highlight the contrast of character and background by the method for histogram equalization Degree.

Effect when picture contrast is for picture segmentation has very big influence, if literal line is relative to line space Contrast is stronger, histogram Wave crest and wave trough will significantly occurs in detection, on the contrary then be probably that a linear rise is right The broken line declined afterwards, it is clear that appearance is smoother if pixel value has histogram Wave crest and wave trough, and we are just easy at these Cut in place.So the purpose handled using Contrast enhanced seeks to strengthen the difference of the gray value of literal line and line space And reduce the gray difference of its own.

The processing of contrast enhancing is carried out for improving recognition effect for the uneven picture of picture grey value profile Effect is apparent.

Rim detection：

Rim detection, it is that brightness changes obvious point in reference numerals image, this is also image procossing and computer Basic problem in vision, rim detection are almost essential in field of character recognition.

Image Edge-Detection is eliminated with identifying incoherent information, is known by significantly reducing data volume to save The other time, and the important structure attribute of image is remained, improve word picture when reading aloud device identification importing and split by row Efficiency.

Text segmentation：

By pretreatment above, we have obtained the binary picture of a contrast enhancing, below can to the figure Piece split, and picture segmentation is broadly divided into following steps：Read gray-scale map and by its binaryzation, literal line segmentation, Word column split, save as the small figures of single jpg.

Read gray-scale map and by its binaryzation：

Each pixel that picture is mainly stored in internal memory in the form of two-dimensional matrix, due to being gray-scale map so What is now generated is single channel two-dimensional matrix.The colouring information of the in store picture of two-dimensional matrix, the span of color value is 0- 255, black correspond to 0, and then white correspond to 255, then the middle incremental color value that is uniformly scattered here and there.In Opencv Mat matrix of the imread functions pictures reading into mat array gray-scale maps can be directly invoked.

Or most of numerical value can be evident that in 0-5, or in 250-255.0-5 numerical value this represent to carry on the back The color value of scape, and the middle 250-255 being mingled with value, represent the color value of white font.Why so handling is Prepared for the segmentation of subsequent pictures.

Text segmentation, by analysis, image at this stage is black matrix wrongly written or mispronounced character, and the pixel value of black is between 0-5, wrongly written or mispronounced character Pixel between 250-255.In order to reach the purpose of separating character, it is necessary to which the pixel that pixel value is 0-5 is normalized into 0, The pixel that pixel value is 250-255 is normalized to 255.The row of word can be so carried out to picture using the value of row pixel sum Segmentation.If black background, one-row pixels and be necessarily 0 if there is white font, the pixel of a line and are naturally larger than 0. Word can be split by row using this algorithm.

The often row character image arrived by row segmentation is preserved, is then gone out each Character segmentation using the method for row pixel sum Come (if black background, a row pixel and necessarily be 0, if there is white font, one row pixel be naturally larger than 0). May be to the Chinese character segmentation of tiled configuration into two parts using the method, therefore need to carry out word picture before it is split Expansion process, column split then is being carried out, is being then converted into white gravoply, with black engraved characters effect.

Text region：

After above-mentioned figure pretreatment, Text segmentation operation is completed, next need that the word of segmentation is identified.

The acquisition of training data：

The good model of training one needs enough data.In the design, English character, Arabic numerals are completed With the identification of Chinese characters in common use.Wherein, English character 52 kinds (A-Z, a-z), Arabic numerals 10 kinds (0-9), Chinese characters in common use 3500 Kind, but there is the fonts such as the Song typeface, black matrix in Chinese character, increase the difficulty of Chinese Character Recognition (only to the Song typeface, regular script, person in servitude in the design Book, black matrix, children's circle, Chinese-language row pattern, the new Wei of Chinese, the easypro body of Chinese are trained).Training data is to write program using Python The 40*40 of generation single character image, word corresponding to each font generate 5, collectively generate 142480 view data As training set, then feasibility of 28496 images of generation as test set checking model at random.The corresponding generation of various fonts Word is as shown in Figure 3.

The selection of training pattern：

Convolutional neural networks (Convolutional Neural Network, CNN) conduct in deep learning of the present invention The training pattern of single character recognition.CNN (Fig. 4) is one kind of artificial neural network, but is more closely similar in network structure The neutral net of organism.Compared to traditional artificial neural network, convolutional neural networks model on the one hand can be directly by original Beginning image is inputted, and automatically extracts characteristics of image；Another aspect CNN models have more preferable generalization ability, even if working as image Also recognition result will not be caused when deforming or noise be present significantly to influence；Another further aspect its to pass through local sensing wild The complexity of network model is reduced with the method for shared weights, but it is higher than the accuracy of conventional model.

Training network model uses for reference the network of MNIST Handwritten Digit Recognitions, but Chinese character have it is increasingly complex and fine Structure, therefore the adjustment of the network parameters such as parameter setting, network model configuration is carried out on this basis, make it more suitable for Chinese character Training and identification.Training network model is as shown in Figure 5.

Neural unit in CNN between adjacent layer is not to connect entirely, but part connects, and the connection between neuron comes from The partial nerve member of last layer.CNN models have used ReLU activation primitives used by this identifying system, and it belongs to unsaturation and swashed Function living, due to when neutral net carries out error back propagation, gradient disappearance be present, i.e., every layer will be with activation primitive First derivativeBe multiplied, when the network number of plies is more, gradient G will constantly decay until [0 ,+∞) disappear. In terms of activation primitive, the design is using ReLu functions as activation primitive

Compared with traditional sigmoid, tanh function, ReLu functions largely can be with the effect of lift scheme.

To improve model generalization ability, add LRN layers (local acknowledgement normalizes layer).This layer of mimic biology nervous system Lateral inhibition mechanism, the activity to local neuron creates competition mechanism so that it is relatively bigger to respond bigger value, performs one Kind " lateral inhibition " operation, normalizing operation is done to the part for inputting number, smoothing processing is done in the output of current layer.The each input value of the layer Will divided byN is local size size：Local_size, α are zoom factor, and β is exponential term, acquiescence It is worth for 5, the calculation formula for doing smoothness constraint is:

Pooling layers in network use overlapping pool with max pattern, i.e., choose the certain area of input data in the calculation The maximum in domain, records position of the maximum in each zonule, and during backpropagation, residual error is delivered into the maximum position Put, other positions zero setting.Residual computations formula is：

If current layer k is convolutional layer, k+1 layers are next layer of sub-sampling layer, and the residual error of j-th of characteristic pattern of kth layer is public Formula is：

Wherein, up (x) is to extend the size of+1 layer of kth as kth layer.

After N layer convolutional layer iteration, to prevent over-fitting, added in rear several layers of pooling layers and ReLU layers Dropout layers, dropout layers it is random allow some nodes to export zero setting, also do not update weight, therefore some features can be avoided Only just come into force under fixed combination, it is conscious to allow network to go to learn some universal general character, rather than some training samples Some characteristics.

Last output layer selects SoftmaxWithLoss graders, as shown in Figure 6.SoftmaxWithLoss is really Multinomial Logistic Loss Layer (cross entropy cost function) and Softmax Layer combination.Assuming that sample Quantity has m, and each sample characteristics quantity is b, calculates this probability of m sample in n class, and calculation formula is：

By the real vector (a of k dimensions₁,a₂,a₃,...,a_n) it is mapped as (b₁,b₂,b₃,...,b_n), then b_iAccording to size come Carry out more classification tasks (weighting weight maximum one-dimensional).

Image capture module uses high-definition camera, and images to be recognized is sent into front end, is carried out by convolutional neural networks Propagated forward, propagated forward general principle are as follows：

It is calculated as from input block to first hidden layer H1：Wherein k values travel through all input layers Node, Z_kIt is the weighted sum to all nodes of preceding layer, f () is nonlinear function, and rear layer is by that analogy.

Input information is handled from input layer through hidden layer, and is transmitted to output layer, one under the influence of the state of each layer of neuron The state of layer neuron, for network output and the error of desired output, with the nonlinear interaction function of output layer.For input Layer arrives the weight w of hidden layer_ijThen still using BP algorithm renewal weight.Input information is handled from input layer through hidden layer, specific processing Mode is：Initial weight and thresholding are set, they are all set to less random number.Output layer is transmitted to after hidden layer is handled by defeated Enter new input vectorWith corresponding output vectorPropagate.

In the propagated forward stage, data source arises from digital independent layer, by some process layers, reaches last layer (loss Layer either characteristic layer), in this stage, the weights in network do not change, and networking path is a directed acyclic graph (DAG), the node since most, by some process layers, in the absence of loop structure, therefore data flow can go ahead and push away Enter until terminal.

Propagated forward process is studied with dataflow analysis method, i.e.,：From input data concentrate take a sample (X, Y), wherein X is data, and Y is label.X is sent into network, successively calculated, obtains corresponding network processes output O, network performs Calculating be formulated as：

O=F_n(...(F₂(F₁(XW₁)W₂)...)W_n)

Wherein, F_i, i-1,2, n represents nonlinear transformation, W_i, i=1,2, n represents each weights layer Weights.O exports for network, (Y, O) can be used to assess network quality, preferable network meets Y==O.

In the training process of character data collection, information transmission will be inputted to output layer by first passing through propagated forward, to network Output and error are modified, and last layer relatively obtains loss function with object function, calculation error updated value, adjusts hidden layer Connection weight to output layer is：w_jk=w_jk-α(y_k-t_k)h_j, α is learning rate.

Input layer is adjusted to the connection weight w of hidden layer_ij=w_ij-αW_jx_i, wherein

Because image feature information is more, data are numerous and diverse, therefore training process needs the training that iterates, until loss is received Hold back, during training sample, keep uniformly input, can finally realize comparatively ideal discrimination.

In text categorization task, 3562 targets (Chinese characters in common use 3500, English upper and lower case letter 52 are classified as Individual, Arabic numerals 10).Entered after training up, network structure model with iterations increase can reach 99% with On accuracy (as shown in Figure 7), it is desirable to illustrate this network architecture.

The functional module that the present invention uses has central control module, image capture module, voice broadcast module, communication module With USB diverter modules；

Central control module：The module will realize the number collected to whole reading aid for blindmen various pieces functional unit Macro or mass analysis, and the scene reflected according to data are carried out according to information, corresponding control instruction is sent to each functional module. According to the functional selection and demonstration to the design, it is desirable to which chip performance high power consumption is low, and suitable for doing hand-hold electronic equipments, communication is set Standby or medical applications equipment, net book, learning machine, monitor video equipment and various man-machine interfaces, it can apply to high definition trip Play, Wireless GPS, mobile video play, intelligent control, instrument and meter, navigation equipment, PDA device, remote monitoring, game The exploitation such as exploitation.Core board can support HDMI voice harmony the core of the card piece voice synchronism outputs.Core board requirement at least meets 3 1G (1G dominant frequency, 1G internal memories, 1GFLASH) requirement, for running frequency up to 1GHZ, processor inside is 64/32 BITBUS network structure. 32/32KB level caches, 512KB L2 caches.Carry 3D figures accelerating engine (SGX540), 2D graphics accelerators, maximum branch Hold 8192*8192 resolution ratio.Video coding supports MPEG-4/H.263/H. to reach 1080@30fps, decodes MPEG2/VC1/ Xvid videos reach [email protected] LINUX PDA operating systems are equipped with exclusively for core board in the present invention, make Modules operational efficiency is higher, performance is more stable, real-time performance is stronger.Intelligent blind read core controls hardware block diagram As shown in Figure 2.

Image capture module：After reader switch is started, corresponding button is selected to select corresponding recognition mode, The central control module of reading aid for blindmen is sent to come the character image of shooting clear by 13,000,000 camera, then by pacifying The main control chip of Linux system has been filled to be compressed to image, after the networking of 3g/4g modules, has been passed by SOCKET programs To server, character image is identified server end, generates corresponding text, passes front-end collection system back again, Reported finally by voice broadcast module, generate corresponding picture.

Camera uses MJPGE format compression images, and MJPEG refers to Motion JPEG, i.e. motion jpeg, according to 25 frames/ Second speed uses jpeg algorithm compressed video signal, completes the compression of dynamic video.Its picture format is that each frame is pressed Contracting, can be usually reached 6：1 compression ratio, being all independent image just as each frame.Motion jpeg can produce high quality, complete Screen, the video of full motion.The so picture pixels of collection out are higher and clear.

Socket using tcp host-host protocols, TCP (transmission control protocol) is a kind of connection-oriented network transmission Agreement.Support multiple data stream operation, there is provided stream control and Wrong control, or even the rearrangement to out of order arrival message, therefore, TCP transmission provides reliable data transport service.It is quick so make it that data transfer is stablized.

Voice broadcast module：Voice broadcast module includes pronounciation processing chip (SYN6288) and other peripheral circuits, the language The major function of sound processing module is exactly to realize the man-machine interaction of user and various functions intermodule, when user uses blind person During reader, after the text message collected is identified, obtained text information is sent to voice broadcast module, the mould The data that block then needs to obtain are handled, and are reported by outside speaker or ear speaker device, inform user's word Interior lane, complete the reading of the printing words such as books.The peripheral circuit schematic diagram of its pronounciation processing chip is as indicated at 3.

Communication module：Communication module is 4G communication modules, the realization of network communicating function, make use of 4G modules without line width Band network access facility.Fourth generation mobile phone mobile communication standard, refer to fourth generation mobile communication technology, foreign language abbreviation： 4G.The technology includes two kinds of standards of TD-LTE and FDD-LTE, is to integrate 3G and WLAN, and can quickly transmit data, height Quality, audio, video and image etc..4G can be downloaded with more than 100Mbps speed, than current home broadband AD SL (4 Million) fast 25 times, and disclosure satisfy that requirement of nearly all user for wireless service.In addition, 4G can be in DSL and cable television The place deployment that modem is not covered with, then expands to whole distract again.

4G modules are connected by USB interface with central control module, and USB main lines bus signals are between each module and system Communication interface.ME3760_V2 modules are loaded according to the ECM mouths carried the driving of Linux3.2.0 kernels, used The USB port of ME3760_V2 modules is attached, and under linux, the ECM mouths of ME3760_V2 modules are mapped to 5 interfaces： ECM, AT, Modem, Log, wherein " " parts of ECM mouths is fallen within, to prevent ECM functions to be capped, carrying out USB strings It should be filtered during mouth initialization, and load the PPP drivings of linux kernel, remaining interface initialization is set for USB serial ports It is standby, finally with PPP instrument dial-up connection 4G networks.

The present invention is realized using 4G communication modules carries out two-way communication between reading aid for blindmen front end and Cloud Server.When The text image that camera gathers is sent to Cloud Server to complete Text region, second, Cloud Server sends out the result of identification The main control chip of reading aid for blindmen front end is delivered to, recognition result is informed by user by voice broadcast module in real time, completes to read Read function.Intelligent blind reader 4G communication module application circuit schematic diagrams are as shown in Figure 4.

USB diverter modules：There are two parts through what USB interface was connected with central control module, first, camera, connects with USB Mouth one is connected；Second, 4G communication modules, are connected with USB interface two.Two parts are respectively through USB diverter modules and central control module Connection, carry out information exchange transmission.

FE1.1s is a highly integrated, high quality, high-performance, low energy consumption, at the same still USB2.0 at a high speed 4 port line concentrations it is low Cost solution.It using it is single exchange converter (Single Transaction Translator) (STT) structure so as to Obtain more benefits.Six rather than two aperiodic translation caches are to reduce potential Communication Jamming.Whole design is based on State machine controls, to reduce the time delay of response；Microcontroller is not used in this chip.To ensure high quality, entirely Chip makes all logic elements by testing scan chain (Test Scan Chain)-under at a high speed (480MHz) pattern Fully tested before shipment.Particularly Built In Self Test (Build-In-Self-Test) type order be use all height Speed, at full speed, and low-speed mode AFE(analog front end) port (AFE) is also such in the packaging and testing stage.The realization of low energy consumption is By using 0.18 μm of technology and integrated power supply/clock control mechanism.Most of pin does not need timing, unless by with Arrive.USB diverter module fundamental diagrams are as shown in Figure 5.

The present invention operation principle be：Image capture module gathers text image information, send USB diverter modules, USB shuntings Text image information is uploaded to Cloud Server by module by communication module, and Cloud Server is using digital figure treatment technology to adopting The image information of collection is pre-processed, the processing of Text segmentation, Text region, then the model by having trained completes Text region work( Can, and recognition result is sent to USB diverter modules with text formatting, USB diverter modules are then forwarded to central control module, in The recognition result of text formatting is changed into voice messaging, language by centre control module by the Audio Processing Unit of voice broadcast module Message breath send bluetooth unit, by the earphone or loudspeaker of bluetooth unit, by voice broadcast to reader.

Claims

A kind of 1. intelligent blind reading method, it is characterised in that：The image information of text is gathered by image capture module, by The image information of text is uploaded to Cloud Server by USB diverter modules by communication module；The image information of Cloud Server collection Pre-processed, Text segmentation, Text region, and recognition result is sent to USB diverter modules with text formatting；USB is shunted Module is retransmited to central control module, and central control module is by the Audio Processing Unit of voice broadcast module by text formatting Recognition result change into voice messaging, bluetooth unit is sent, by the earphone or loudspeaker of bluetooth unit, by voice broadcast to being read Reader.
2. intelligent blind reading method according to claim 1, it is characterised in that：The Cloud Server is by image point Analysis system completes picture pretreatment, Text segmentation and Text region.
3. intelligent blind reading method according to claim 2, it is characterised in that：Described Text region, it is using text Word train model is completed, the training pattern using the convolutional neural networks in deep learning as single character recognition.
4. according to the intelligent blind reader system of the intelligent blind reading method of claim 1,2 or 3, its feature exists In：Including central control module, central control module is connected to voice broadcast module and USB diverter modules, USB divergent dies Block is connected by communication module with Cloud Server, and USB diverter modules are also associated with image capture module.
5. intelligent blind reader system according to claim 4, it is characterised in that：The voice broadcast module includes language Sound processing unit, Audio Processing Unit are connected with bluetooth unit.
6. intelligent blind reader system according to claim 5, it is characterised in that：The phonetic synthesis unit includes string Mouth transmitting-receiving, phonetic synthesis and voice output circuit, the bluetooth unit include power amplifier and loudspeaker；The input of serial ports transmitting-receiving End is connected with the UART3 of central control module, and the output end of serial ports transmitting-receiving passes through phonetic synthesis and voice output connection, language The power amplifier that the output of sound outlet line passes sequentially through bluetooth unit connects with loudspeaker.
7. intelligent blind reader system according to claim 4, it is characterised in that：The central control module includes master Chip is controlled, main control chip is connected with UART3 and UART2 respectively by caching, processing and control element (PCE), and UART2 is connected with reserved string Mouthful, caching, processing and control element (PCE) are also associated with USB.HOST 2.0, USB.HOST 2.0 and USB diverter modules upstream port Connection.
8. intelligent blind reader system according to claim 4, it is characterised in that：The USB diverter modules include more Port transmitting-receiving control unit, multiport transmitting-receiving control unit are connected to upstream port, exchange, change, caching, processing, The spare interface of the camera connection of USB1, USB3 and USB2, USB1 and image capture module, USB3 and image capture module connects Connect, the USB interface connection of USB2 and communication module.
9. intelligent blind reader system according to claim 4, it is characterised in that：The communication module communicates including 4G Chip, 4G communication chips are connected to radio-frequency antenna, SIM card and USB interface, and radio-frequency antenna is connected with Cloud Server.