WO2022075929A1

WO2022075929A1 - Visual word recognition and correction method

Info

Publication number: WO2022075929A1
Application number: PCT/TR2020/050925
Authority: WO
Inventors: Adnan ÖNCEVARLIK
Original assignee: Ünsped Gümrük Müşavirliği Ve Lojistik Hizmetler A.Ş.
Priority date: 2020-10-07
Filing date: 2020-10-07
Publication date: 2022-04-14

Abstract

The invention in question is related to a method that solves the problem of the inability to detect correct words due to special characters/letters contained in alphabets of some languages by using visual word recognition and correction method and allows the correct word to be proposed.

Description

VISUAL WORD RECOGNITION AND CORRECTION METHOD

TECHNICAL AREA OF THE INVENTION

The so-called invention is related to a method that solves the problem of the inability to detect correct words due to special characters/letters contained in alphabets of some languages by using visual word recognition and correction method and allows the correct word to be proposed.

KNOWN STATUS OF THE TECHNIQUE

In studies regarding the known status of the technique, text and character based correction algorithms (for example, Levenshtein distance) are used. In general, words are combined by adding/changing/deleting on the basis of letters and comparing with the dictionary at hand to find the correct spelling of the word.

It is observed that Damerau-Levenshtein, Hamming Distance methods applied all over the world, and controls are conducted on the basis of text and character.

What is done with Levenshtein Distance or Humming Distance is actually based on the method of calculating with how many letter change a word can be found by making at least one change to another word. For example, converting the word “agirlik” to “agirhk” it predicts that the correction can be performed with at least 3 letter changes. In particular, it is not successful in correcting Turkish words that do not use characters specific to Turkish. Especially in words where 2 or more letter errors are made, text/character based algorithms consume more resources and cannot achieve accurate results.

Moreover, in current studies, problems such as finding incorrect words or failing to find the word that is trying to be corrected are encountered in applied techniques due to some letters (especially g, §, ti, etc.) specific to Turkish. For example, when “agirlikli” is written instead of “agirhkh” situations such as not being able to find it are encountered. BRIEF DESCRIPTION OF THE INVENTION

The current invention concerns the method of visual word recognition and correction, to eliminate the disadvantages mentioned above and brings new advantages to the relevant technical field.

Thanks to the method/technique developed with the so-called invention, it is possible to give more accurate correction results in words that cannot be found correctly. In the known case of the technique above, finding of the correct word is provided by the method presented in the invention in words given as an example and in which more letter errors have been made than one letter in the same word. With the help of this invention, such a need will be met, especially in Turkish word corrections. Current methods do not yield effective results in such cases.

Visual methods/methods are used in this invention.

The aim of the invention is to present a method that proposes the corrected word that cannot be detected by visual detection of words with letter errors in the known state of the technique by using artificial neural networks.

Another goal of the invention is to provide a new method, especially used in correction of Turkish words.

BRIEF DESCRIPTION OF FIGURES

Below is a description of the figures prepared for a better understanding of the visual word recognition and correction method developed with this invention.

Figure 1 is a representation of each of the words held in text transformed into a graphic with white dots on a black background.

Figure 2 is an example of the state of the word combinations created on the disk.

Figure 3 is class information that will help us find out what the word in the CNN model training. DETAILED DESCRIPTION OF THE INVENTION

In this detailed statement, the subject of invention innovation is explained only by examples that will not have any limiting effect on better understanding of the subject.

The invention in question is related to the method of visual word recognition and correction. Instead of the traditional and worldwide Damerau-Levenshtein, Hamming Distance methods and text and character based controls, the visuals of each word are converted into visual form by artificial neural networks modeling and suggesting the visual word that should/most closely resemble it.

Thanks to this method the system offers a method that finds the correct words that are perceived incorrectly because of some letters (especially g, §, ti, etc.) specific to the Turkish language encountered in the current situation and suggests the right word. For example, word “sagir” and word “sagir.” In the same way, when “sagir” is written, other algorithms can also produce the word “sigir” but thanks to the visual recognition algorithm, the word “sagir” can be perceived as “sagir.” Because without analyzing on the basis of letters, it looks directly at the word visually and suggests the correct word that most closely resembles it.

Thanks to the technique developed with the invention in question, a training set consisting of correct spelling of words is created and the CNN class model is trained using the classification model of convolutional neural networks (CNN: Convolutional Neural Network) from visual methods from artificial neural networks, based on the visual system that a normal person uses when reading words. The incoming text is first converted (pictured) to a graphical state, and then the text that has just arrived/entered to the pre-trained model of this visual word is predicted, and the most accurate equivalent of the word, the word with the highest accuracy rate is returned.

Detailed explanation of UGM Vision: Ugm refers to the invention.

The codes given as examples in the following descriptions are just general codes of how the work is done, and are not limited to these codes or coding languages. The essence is method, the code examples are given only for detailed descriptions of the method. First, about 1,300,000 words for Turkish (see example Word table. Table-1) and combinations (see Table-2) are visualized (Figure-1).

Table - 1

In other words, each of the words held as text is converted into a graphic with white dots on a black background (Figure- 1).

Visualization

Python Code: imgWidth = 200 imgHeight = 16 cColor = 200 nBlur = 0.8 font = ImageFont.truetype("D:/DataPool/DilBilimi/WordCorrector/consola.ttf", 16) aWord = "yeni" image = Image.new("L", (imgWidth, imgHeight), (0)) draw = ImageDraw.Draw(image) draw.text(( 1 , 0), aWord, font=font, fill=(cColor)) image = image.filter(ImageFilter.GaussianBlur( nBlur)) In the example, the word “yeni” has been converted to a PNG graphic that is 200 pixels wide and 16 pixels high. The width and height of the chart are not limited to the numbers specified in the example, but can also be of other values. Like this, 1,300,000 (there may be more) words in dictionary sequence are visualized by creating word combinations with sequence (example word combination Table- 1).

Example: the word “yeni” is formed as follows.

Table-2

A format like “nnnnn wordspelling.png” is applied as the file name for the combinations that occur for each word and saved on disk as training data. Figure 2 shows the state of the sample word combinations on the disk.

The number in front of each of these combined files created gives information about which word the image actually belongs to. For example, as can be seen in the example image below, the word “yeni” begins with 00020_ and its combinations begin with 00020_ in the same way. In this CNN model training, it is class knowledge that will help to find out what the word is (Figure-3). In other words, the prefix 00020_ was used as the equivalent of the word “yeni.” In this way, the correct states of the classes/words to be used in the classification are determined by creating a unique prefix of each correct word.

After a class table is created for the correct states of each word and a prefix is given to each word, the training data set for the CNN model is prepared. These graphic/image files created are given to the model for training. A very serious quantity of word images will be formed here. Millions of combinations of word images make up the CNN model's training dataset.

Python Code: path = "D:/DataPool/DilBilimi/WordCorrector/word_images/gs_frq_words_200_16_dict_kombine/" aFiles = GetFolders(path)

X_train = [] i = 0 for aFileName in aFiles: aClass = aFileName. split("_")[O] im = Image.open(path+aFileName)

X_train.append(np.array(im, dtype="uint8")) i += 1 if (i % 10000) == 0: print("i§lenen Dosya Sayisi: {:>7d}".format(i)) y_train = [] for aFileName in aFiles: aClass = aFileName. split("_")[O] y_train.append(int(aClass))

X_train = np. array (X_train)

X_train = np.reshape(X_train, (len(aFiles), imgHeight, imgWidth, 1)) y_train = np.array(y_train)

In X_train, a data set is formed as a numerical matrix of each word image to be trained. Y_train also contains the result word index to be used in the classification.

CNN Model:

A model is created and trained in the following way using the CNN algorithm used in Normal visual classification.

Python Code: model = Sequential) model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same', input_shape=(imgHeight, imgWidth, 1))) model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same')) model.add(MaxPooling2D((2, 2))) model.add(Flatten()) model.add(Dense(256, activation='relu')) model.add(Dense(256, activation='relu')) model.add(Dense(len(wordList), activation='softmax')) opt = SGD(lr=0.001, momentum=0.9) model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy']) model.fit(X_train, y_train, epochs=20, batch_size=2048, verbose=l)

The trained model and weights created are then recorded.

Python Code:

# Save Model To Disk model J son = model, to json() with open("D:/DataPool/DilBilimi/WordCorrector/CNNCorrectorModel_gs_frq_kombine"+mModel No+".json", "w") as json_file: j son_file. write(model J son)

# serialize weights to HDF5 model.save_weights("D:/DataPool/DilBilimi/WordCorrector/CNNCorrectorWeights_gs_frq_ko mbine " +mModelNo+ " .h5 " ) printf - >Model ve Weights kaydedildi.')

The text information entered when predicting is first converted to a graphical state as described above, and the resulting graph is converted into a data set in a matrix and estimated using a pretrained model.

Python Code: def UGMVision(aWord, apply _lev = 0):

X_test = [] image = Image.new("L", (imgWidth, imgHeight), (0)) draw = ImageDraw.Draw(image) draw.text((l, 0), aWord, font=font, fill=(cColor)) image = image.filter(ImageFilter.GaussianBlur(nBlur))

X_test.append(np.array(image, dtype="float32")) X_test = np. array (X_test)

X_test = np.reshape(X_test, ( 1 , imgHeight, imgWidth, 1 )) y_predict = model.predict(X_test) return getMostProbableWordSingle(y_predict[0] , aWord, apply_lev) aWord = "birlikte" aSentence = "birlikte" while aSentence != "exit": print(" - ") aSentence = input("Cumleyi Girin:") aList = aSentence.splitQ aMeaning = ""

X_test = [] for aWord in aList: image = Image.new("L", (imgWidth, imgHeight), (0)) draw = ImageDraw.Draw(image) draw.text(( 1 , 0), aWord, font=font, fill=(cColor)) image = image.filter(ImageFilter.GaussianBlur(nBlur))

X_test.append(np.array(image, dtype="float32"))

X_test = np.array(X_test)

X_test = np.reshape(X_test, (1, imgHeight, imgWidth, 1)) y_predict = model.predict(X_test) aMeaning = aMeaning + "[ " + getMostProbableWordMulti(y_predict[0], aWord, 0) + " ] - t! print(" - ") printf'UGM Vision -> { }".format( aMeaning))

The one among results that is closer is recommended to the user.

The invention is available for any language.

Based on the detailed information described above, visual word recognition and correction method consists of following process steps: i. Creating a class table for the correct states of each word, ii. Training of the classification model using convolutional neural networks and similar visual classification algorithms or models of the training set consisting of correct spelling of words in the relevant language, iii. Creating a graph for each of the words that are required to correct or control, iv. Prediction the generated visual word or text from a pre-trained artificial neural network classification model, v. Suggesting the word to the user by finding the correct word that gives the highest hit rate from the predicted Word. The correction algorithm is adaptable for any visually writable language that has been used/being used around the world and is not limited only to Turkish. It can be adapted for any written language.

Another application of the invention is to convert each of the words into a graphical state using color combinations that ensure high contrast or distinctiveness in the graph creation process mentioned in the method. In the graphic creation process mentioned, the color combination is white dots on a black background.

Claims

CLAIMS Visual word recognition and correction method and feature consists of following process steps; i. Creating a class table for the correct states of each word, ii. Learning the classification model by using the visual classification model of the training set consisting of correct spelling of words in the relevant language, iii. Creating a graph for each of the words that are required to correct or control, iv. Predicting the generated visual word or text from a pre-trained artificial neural network classification model, v. Suggesting the word to the user by finding the correct word that gives the highest hit rate from the predicted word. Method according to claim 1 where conversion to a graphical state in said graph creation step use color combinations that ensure high contrast or distinctiveness of each of the words. Method according to claim 2 where the color combination in the said graph creation step is white dots on a black background. Method according to claim 2 where said visual classification model is a model of convolutional neural networks.