WO2022075929A1 - Visual word recognition and correction method - Google Patents

Visual word recognition and correction method Download PDF

Info

Publication number
WO2022075929A1
WO2022075929A1 PCT/TR2020/050925 TR2020050925W WO2022075929A1 WO 2022075929 A1 WO2022075929 A1 WO 2022075929A1 TR 2020050925 W TR2020050925 W TR 2020050925W WO 2022075929 A1 WO2022075929 A1 WO 2022075929A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
words
correct
visual
model
Prior art date
Application number
PCT/TR2020/050925
Other languages
French (fr)
Inventor
Adnan ÖNCEVARLIK
Original Assignee
Ünsped Gümrük Müşavirliği Ve Lojistik Hizmetler A.Ş.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ünsped Gümrük Müşavirliği Ve Lojistik Hizmetler A.Ş. filed Critical Ünsped Gümrük Müşavirliği Ve Lojistik Hizmetler A.Ş.
Priority to PCT/TR2020/050925 priority Critical patent/WO2022075929A1/en
Publication of WO2022075929A1 publication Critical patent/WO2022075929A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/106Display of layout of documents; Previewing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures

Definitions

  • the so-called invention is related to a method that solves the problem of the inability to detect correct words due to special characters/letters contained in alphabets of some languages by using visual word recognition and correction method and allows the correct word to be proposed.
  • Levenshtein Distance or Humming Distance is actually based on the method of calculating with how many letter change a word can be found by making at least one change to another word. For example, converting the word “agirlik” to “agirhk” it predicts that the correction can be performed with at least 3 letter changes. In particular, it is not successful in correcting Turkish words that do not use characters specific to Vietnamese. Especially in words where 2 or more letter errors are made, text/character based algorithms consume more resources and cannot achieve accurate results.
  • the current invention concerns the method of visual word recognition and correction, to eliminate the disadvantages mentioned above and brings new advantages to the relevant technical field.
  • the aim of the invention is to present a method that proposes the corrected word that cannot be detected by visual detection of words with letter errors in the known state of the technique by using artificial neural networks.
  • Another goal of the invention is to provide a new method, especially used in correction of Turkish words.
  • Figure 1 is a representation of each of the words held in text transformed into a graphic with white dots on a black background.
  • Figure 2 is an example of the state of the word combinations created on the disk.
  • Figure 3 is class information that will help us find out what the word in the CNN model training. DETAILED DESCRIPTION OF THE INVENTION
  • the invention in question is related to the method of visual word recognition and correction. Instead of the traditional and worldwide Damerau-Levenshtein, Hamming Distance methods and text and character based controls, the visuals of each word are converted into visual form by artificial neural networks modeling and suggesting the visual word that should/most closely resemble it.
  • the system offers a method that finds the correct words that are perceived incorrectly because of some letters (especially g, ⁇ , ti, etc.) specific to the Turkish language encountered in the current situation and suggests the right word. For example, word “sagir” and word “sagir.” In the same way, when “sagir” is written, other algorithms can also produce the word “sigir” but thanks to the visual recognition algorithm, the word “sagir” can be perceived as “sagir.” Because without analyzing on the basis of letters, it looks directly at the word visually and suggests the correct word that most closely resembles it.
  • some letters especially g, ⁇ , ti, etc.
  • CNN Convolutional Neural Network
  • UGM Vision refers to the invention.
  • each of the words held as text is converted into a graphic with white dots on a black background ( Figure- 1).
  • FIG. 1 A format like “nnnnn wordspelling.png” is applied as the file name for the combinations that occur for each word and saved on disk as training data.
  • Figure 2 shows the state of the sample word combinations on the disk.
  • the number in front of each of these combined files created gives information about which word the image actually belongs to. For example, as can be seen in the example image below, the word “yeni” begins with 00020_ and its combinations begin with 00020_ in the same way. In this CNN model training, it is class knowledge that will help to find out what the word is ( Figure-3). In other words, the prefix 00020_ was used as the equivalent of the word “yeni.” In this way, the correct states of the classes/words to be used in the classification are determined by creating a unique prefix of each correct word.
  • the training data set for the CNN model is prepared. These graphic/image files created are given to the model for training. A very serious quantity of word images will be formed here. Millions of combinations of word images make up the CNN model's training dataset.
  • X_train np.reshape(X_train, (len(aFiles), imgHeight, imgWidth, 1))
  • y_train np.array(y_train)
  • X_train a data set is formed as a numerical matrix of each word image to be trained.
  • Y_train also contains the result word index to be used in the classification.
  • a model is created and trained in the following way using the CNN algorithm used in Normal visual classification.
  • the trained model and weights created are then recorded.
  • the text information entered when predicting is first converted to a graphical state as described above, and the resulting graph is converted into a data set in a matrix and estimated using a pretrained model.
  • X_test np.reshape(X_test, ( 1 , imgHeight, imgWidth, 1 ))
  • y_predict model.predict(X_test) return getMostProbableWordSingle(y_predict[0] , aWord, apply_lev)
  • aWord "birlikte”
  • aSentence input("Cumleyi Girin:”)
  • X_test np.reshape(X_test, (1, imgHeight, imgWidth, 1))
  • y_predict model.predict(X_test)
  • aMeaning aMeaning + "[ " + getMostProbableWordMulti(y_predict[0], aWord, 0) + " ] - t! print(" - ") printf'UGM Vision -> ⁇ ⁇ ".format( aMeaning))
  • the one among results that is closer is recommended to the user.
  • the invention is available for any language.
  • visual word recognition and correction method consists of following process steps: i. Creating a class table for the correct states of each word, ii. Training of the classification model using convolutional neural networks and similar visual classification algorithms or models of the training set consisting of correct spelling of words in the relevant language, iii. Creating a graph for each of the words that are required to correct or control, iv. Prediction the generated visual word or text from a pre-trained artificial neural network classification model, v. Suggesting the word to the user by finding the correct word that gives the highest hit rate from the predicted Word.
  • the correction algorithm is adaptable for any visually writable language that has been used/being used around the world and is not limited only to Turkish. It can be adapted for any written language.
  • Another application of the invention is to convert each of the words into a graphical state using color combinations that ensure high contrast or distinctiveness in the graph creation process mentioned in the method.
  • the color combination is white dots on a black background.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Character Discrimination (AREA)

Abstract

The invention in question is related to a method that solves the problem of the inability to detect correct words due to special characters/letters contained in alphabets of some languages by using visual word recognition and correction method and allows the correct word to be proposed.

Description

VISUAL WORD RECOGNITION AND CORRECTION METHOD
TECHNICAL AREA OF THE INVENTION
The so-called invention is related to a method that solves the problem of the inability to detect correct words due to special characters/letters contained in alphabets of some languages by using visual word recognition and correction method and allows the correct word to be proposed.
KNOWN STATUS OF THE TECHNIQUE
In studies regarding the known status of the technique, text and character based correction algorithms (for example, Levenshtein distance) are used. In general, words are combined by adding/changing/deleting on the basis of letters and comparing with the dictionary at hand to find the correct spelling of the word.
It is observed that Damerau-Levenshtein, Hamming Distance methods applied all over the world, and controls are conducted on the basis of text and character.
What is done with Levenshtein Distance or Humming Distance is actually based on the method of calculating with how many letter change a word can be found by making at least one change to another word. For example, converting the word “agirlik” to “agirhk” it predicts that the correction can be performed with at least 3 letter changes. In particular, it is not successful in correcting Turkish words that do not use characters specific to Turkish. Especially in words where 2 or more letter errors are made, text/character based algorithms consume more resources and cannot achieve accurate results.
Moreover, in current studies, problems such as finding incorrect words or failing to find the word that is trying to be corrected are encountered in applied techniques due to some letters (especially g, §, ti, etc.) specific to Turkish. For example, when “agirlikli” is written instead of “agirhkh” situations such as not being able to find it are encountered. BRIEF DESCRIPTION OF THE INVENTION
The current invention concerns the method of visual word recognition and correction, to eliminate the disadvantages mentioned above and brings new advantages to the relevant technical field.
Thanks to the method/technique developed with the so-called invention, it is possible to give more accurate correction results in words that cannot be found correctly. In the known case of the technique above, finding of the correct word is provided by the method presented in the invention in words given as an example and in which more letter errors have been made than one letter in the same word. With the help of this invention, such a need will be met, especially in Turkish word corrections. Current methods do not yield effective results in such cases.
Visual methods/methods are used in this invention.
The aim of the invention is to present a method that proposes the corrected word that cannot be detected by visual detection of words with letter errors in the known state of the technique by using artificial neural networks.
Another goal of the invention is to provide a new method, especially used in correction of Turkish words.
BRIEF DESCRIPTION OF FIGURES
Below is a description of the figures prepared for a better understanding of the visual word recognition and correction method developed with this invention.
Figure 1 is a representation of each of the words held in text transformed into a graphic with white dots on a black background.
Figure 2 is an example of the state of the word combinations created on the disk.
Figure 3 is class information that will help us find out what the word in the CNN model training. DETAILED DESCRIPTION OF THE INVENTION
In this detailed statement, the subject of invention innovation is explained only by examples that will not have any limiting effect on better understanding of the subject.
The invention in question is related to the method of visual word recognition and correction. Instead of the traditional and worldwide Damerau-Levenshtein, Hamming Distance methods and text and character based controls, the visuals of each word are converted into visual form by artificial neural networks modeling and suggesting the visual word that should/most closely resemble it.
Thanks to this method the system offers a method that finds the correct words that are perceived incorrectly because of some letters (especially g, §, ti, etc.) specific to the Turkish language encountered in the current situation and suggests the right word. For example, word “sagir” and word “sagir.” In the same way, when “sagir” is written, other algorithms can also produce the word “sigir” but thanks to the visual recognition algorithm, the word “sagir” can be perceived as “sagir.” Because without analyzing on the basis of letters, it looks directly at the word visually and suggests the correct word that most closely resembles it.
Thanks to the technique developed with the invention in question, a training set consisting of correct spelling of words is created and the CNN class model is trained using the classification model of convolutional neural networks (CNN: Convolutional Neural Network) from visual methods from artificial neural networks, based on the visual system that a normal person uses when reading words. The incoming text is first converted (pictured) to a graphical state, and then the text that has just arrived/entered to the pre-trained model of this visual word is predicted, and the most accurate equivalent of the word, the word with the highest accuracy rate is returned.
Detailed explanation of UGM Vision: Ugm refers to the invention.
The codes given as examples in the following descriptions are just general codes of how the work is done, and are not limited to these codes or coding languages. The essence is method, the code examples are given only for detailed descriptions of the method. First, about 1,300,000 words for Turkish (see example Word table. Table-1) and combinations (see Table-2) are visualized (Figure-1).
Figure imgf000005_0001
Figure imgf000006_0001
Table - 1
In other words, each of the words held as text is converted into a graphic with white dots on a black background (Figure- 1).
Visualization
Python Code: imgWidth = 200 imgHeight = 16 cColor = 200 nBlur = 0.8 font = ImageFont.truetype("D:/DataPool/DilBilimi/WordCorrector/consola.ttf", 16) aWord = "yeni" image = Image.new("L", (imgWidth, imgHeight), (0)) draw = ImageDraw.Draw(image) draw.text(( 1 , 0), aWord, font=font, fill=(cColor)) image = image.filter(ImageFilter.GaussianBlur( nBlur)) In the example, the word “yeni” has been converted to a PNG graphic that is 200 pixels wide and 16 pixels high. The width and height of the chart are not limited to the numbers specified in the example, but can also be of other values. Like this, 1,300,000 (there may be more) words in dictionary sequence are visualized by creating word combinations with sequence (example word combination Table- 1).
Example: the word “yeni” is formed as follows.
Figure imgf000007_0001
Figure imgf000008_0001
Table-2
A format like “nnnnn wordspelling.png” is applied as the file name for the combinations that occur for each word and saved on disk as training data. Figure 2 shows the state of the sample word combinations on the disk.
The number in front of each of these combined files created gives information about which word the image actually belongs to. For example, as can be seen in the example image below, the word “yeni” begins with 00020_ and its combinations begin with 00020_ in the same way. In this CNN model training, it is class knowledge that will help to find out what the word is (Figure-3). In other words, the prefix 00020_ was used as the equivalent of the word “yeni.” In this way, the correct states of the classes/words to be used in the classification are determined by creating a unique prefix of each correct word.
After a class table is created for the correct states of each word and a prefix is given to each word, the training data set for the CNN model is prepared. These graphic/image files created are given to the model for training. A very serious quantity of word images will be formed here. Millions of combinations of word images make up the CNN model's training dataset.
Python Code: path = "D:/DataPool/DilBilimi/WordCorrector/word_images/gs_frq_words_200_16_dict_kombine/" aFiles = GetFolders(path)
X_train = [] i = 0 for aFileName in aFiles: aClass = aFileName. split("_")[O] im = Image.open(path+aFileName)
X_train.append(np.array(im, dtype="uint8")) i += 1 if (i % 10000) == 0: print("i§lenen Dosya Sayisi: {:>7d}".format(i)) y_train = [] for aFileName in aFiles: aClass = aFileName. split("_")[O] y_train.append(int(aClass))
X_train = np. array (X_train)
X_train = np.reshape(X_train, (len(aFiles), imgHeight, imgWidth, 1)) y_train = np.array(y_train)
In X_train, a data set is formed as a numerical matrix of each word image to be trained. Y_train also contains the result word index to be used in the classification.
CNN Model:
A model is created and trained in the following way using the CNN algorithm used in Normal visual classification.
Python Code: model = Sequential) model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same', input_shape=(imgHeight, imgWidth, 1))) model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same')) model.add(MaxPooling2D((2, 2))) model.add(Flatten()) model.add(Dense(256, activation='relu')) model.add(Dense(256, activation='relu')) model.add(Dense(len(wordList), activation='softmax')) opt = SGD(lr=0.001, momentum=0.9) model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy']) model.fit(X_train, y_train, epochs=20, batch_size=2048, verbose=l)
The trained model and weights created are then recorded.
Python Code:
# Save Model To Disk model J son = model, to json() with open("D:/DataPool/DilBilimi/WordCorrector/CNNCorrectorModel_gs_frq_kombine"+mModel No+".json", "w") as json_file: j son_file. write(model J son)
# serialize weights to HDF5 model.save_weights("D:/DataPool/DilBilimi/WordCorrector/CNNCorrectorWeights_gs_frq_ko mbine " +mModelNo+ " .h5 " ) printf - >Model ve Weights kaydedildi.')
The text information entered when predicting is first converted to a graphical state as described above, and the resulting graph is converted into a data set in a matrix and estimated using a pretrained model.
Python Code: def UGMVision(aWord, apply _lev = 0):
X_test = [] image = Image.new("L", (imgWidth, imgHeight), (0)) draw = ImageDraw.Draw(image) draw.text((l, 0), aWord, font=font, fill=(cColor)) image = image.filter(ImageFilter.GaussianBlur(nBlur))
X_test.append(np.array(image, dtype="float32")) X_test = np. array (X_test)
X_test = np.reshape(X_test, ( 1 , imgHeight, imgWidth, 1 )) y_predict = model.predict(X_test) return getMostProbableWordSingle(y_predict[0] , aWord, apply_lev) aWord = "birlikte" aSentence = "birlikte" while aSentence != "exit": print(" - ") aSentence = input("Cumleyi Girin:") aList = aSentence.splitQ aMeaning = ""
X_test = [] for aWord in aList: image = Image.new("L", (imgWidth, imgHeight), (0)) draw = ImageDraw.Draw(image) draw.text(( 1 , 0), aWord, font=font, fill=(cColor)) image = image.filter(ImageFilter.GaussianBlur(nBlur))
X_test.append(np.array(image, dtype="float32"))
X_test = np.array(X_test)
X_test = np.reshape(X_test, (1, imgHeight, imgWidth, 1)) y_predict = model.predict(X_test) aMeaning = aMeaning + "[ " + getMostProbableWordMulti(y_predict[0], aWord, 0) + " ] - t! print(" - ") printf'UGM Vision -> { }".format( aMeaning))
The one among results that is closer is recommended to the user.
The invention is available for any language.
Based on the detailed information described above, visual word recognition and correction method consists of following process steps: i. Creating a class table for the correct states of each word, ii. Training of the classification model using convolutional neural networks and similar visual classification algorithms or models of the training set consisting of correct spelling of words in the relevant language, iii. Creating a graph for each of the words that are required to correct or control, iv. Prediction the generated visual word or text from a pre-trained artificial neural network classification model, v. Suggesting the word to the user by finding the correct word that gives the highest hit rate from the predicted Word. The correction algorithm is adaptable for any visually writable language that has been used/being used around the world and is not limited only to Turkish. It can be adapted for any written language.
Another application of the invention is to convert each of the words into a graphical state using color combinations that ensure high contrast or distinctiveness in the graph creation process mentioned in the method. In the graphic creation process mentioned, the color combination is white dots on a black background.

Claims

CLAIMS Visual word recognition and correction method and feature consists of following process steps; i. Creating a class table for the correct states of each word, ii. Learning the classification model by using the visual classification model of the training set consisting of correct spelling of words in the relevant language, iii. Creating a graph for each of the words that are required to correct or control, iv. Predicting the generated visual word or text from a pre-trained artificial neural network classification model, v. Suggesting the word to the user by finding the correct word that gives the highest hit rate from the predicted word. Method according to claim 1 where conversion to a graphical state in said graph creation step use color combinations that ensure high contrast or distinctiveness of each of the words. Method according to claim 2 where the color combination in the said graph creation step is white dots on a black background. Method according to claim 2 where said visual classification model is a model of convolutional neural networks.
PCT/TR2020/050925 2020-10-07 2020-10-07 Visual word recognition and correction method WO2022075929A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/TR2020/050925 WO2022075929A1 (en) 2020-10-07 2020-10-07 Visual word recognition and correction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/TR2020/050925 WO2022075929A1 (en) 2020-10-07 2020-10-07 Visual word recognition and correction method

Publications (1)

Publication Number Publication Date
WO2022075929A1 true WO2022075929A1 (en) 2022-04-14

Family

ID=81125648

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/TR2020/050925 WO2022075929A1 (en) 2020-10-07 2020-10-07 Visual word recognition and correction method

Country Status (1)

Country Link
WO (1) WO2022075929A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170339169A1 (en) * 2016-05-23 2017-11-23 GreatHorn, Inc. Computer-implemented methods and systems for identifying visually similar text character strings
US20180137350A1 (en) * 2016-11-14 2018-05-17 Kodak Alaris Inc. System and method of character recognition using fully convolutional neural networks with attention
CN110765996A (en) * 2019-10-21 2020-02-07 北京百度网讯科技有限公司 Text information processing method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170339169A1 (en) * 2016-05-23 2017-11-23 GreatHorn, Inc. Computer-implemented methods and systems for identifying visually similar text character strings
US20180137350A1 (en) * 2016-11-14 2018-05-17 Kodak Alaris Inc. System and method of character recognition using fully convolutional neural networks with attention
CN110765996A (en) * 2019-10-21 2020-02-07 北京百度网讯科技有限公司 Text information processing method and device

Similar Documents

Publication Publication Date Title
CN109190131B (en) Neural machine translation-based English word and case joint prediction method thereof
CN108628823B (en) Named entity recognition method combining attention mechanism and multi-task collaborative training
CN109492202A (en) A kind of Chinese error correction of coding and decoded model based on phonetic
CN111310447B (en) Grammar error correction method, grammar error correction device, electronic equipment and storage medium
CN110826334B (en) Chinese named entity recognition model based on reinforcement learning and training method thereof
CN107992211B (en) CNN-LSTM-based Chinese character misspelling and mispronounced character correction method
TWI567569B (en) Natural language processing systems, natural language processing methods, and natural language processing programs
CN110968299A (en) Front-end engineering code generation method based on hand-drawn webpage image
CN114036950B (en) Medical text named entity recognition method and system
CN108563634A (en) Recognition methods, system, computer equipment and the storage medium of word misspelling
CN112434520A (en) Named entity recognition method and device and readable storage medium
CN114490953A (en) Training event extraction model, event extraction method and target event extraction model
CN112488111B (en) Indication expression understanding method based on multi-level expression guide attention network
CN112528168B (en) Social network text emotion analysis method based on deformable self-attention mechanism
CN113297374B (en) Text classification method based on BERT and word feature fusion
WO2022075929A1 (en) Visual word recognition and correction method
CN116702760A (en) Geographic naming entity error correction method based on pre-training deep learning
Kalaichelvi et al. Application of neural networks in character recognition
Antunes et al. A bi-directional multiple timescales LSTM model for grounding of actions and verbs
CN116246278A (en) Character recognition method and device, storage medium and electronic equipment
CN113282746B (en) Method for generating variant comment countermeasure text of network media platform
CN114372467A (en) Named entity extraction method and device, electronic equipment and storage medium
CN112784576A (en) Text dependency syntax analysis method
Zhang et al. Drawing order recovery based on deep learning
Kim et al. Digital handwriting correction using deep learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20956863

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20956863

Country of ref document: EP

Kind code of ref document: A1