WO2019024050A1 - Deep context-based grammatical error correction using artificial neural networks - Google Patents
Deep context-based grammatical error correction using artificial neural networks Download PDFInfo
- Publication number
- WO2019024050A1 WO2019024050A1 PCT/CN2017/095841 CN2017095841W WO2019024050A1 WO 2019024050 A1 WO2019024050 A1 WO 2019024050A1 CN 2017095841 W CN2017095841 W CN 2017095841W WO 2019024050 A1 WO2019024050 A1 WO 2019024050A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- word
- target word
- target
- sentence
- grammatical error
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
Definitions
- the disclosure relates generally to artificialintelligence, and more particularly, to grammatical error correction using artificial neural networks.
- GEC Automated grammatical error correction
- the disclosure relates generally to artificialintelligence, and more particularly, to grammatical error correction using artificial neural networks.
- a method for grammatical error detection is disclosed.
- a sentence is received.
- One or more target words in the sentence are identifiedbased, at least in part, on one or more grammatical error types.
- Each of the one or more target words corresponds to at least one of the one or more grammatical error types.
- aclassification of the target word with respect to the corresponding grammatical error type is estimatedusing an artificial neural network model trained for the grammatical error type.
- the model includes two recurrent neural networks configured to output a context vector of the target word based, at least in part, on at least one word before the target word and at least one word after the target word in the sentence.
- the model further includes a feedforward neural network configured to output a classification value of the target word with respect to the grammatical error type based, at least in part, on the context vector of the target word.
- a grammatical error in the sentence is detected based, at least in part, on the target word and the estimated classification of the target word.
- a method for training an artificial neural network model is provided.
- An artificial neural network model for estimating a classification of a target word in a sentence with respect to a grammatical error type is provided.
- the model includes two recurrent neural networks configured to output a context vector of the target word based, at least in part, on at least one word before the target word and at least one word after the target word in the sentence.
- the model further includes a feedforward neural network configured to output a classification value of the target word based, at least in part, on the context vector of the target word.
- a set of training samples are obtained.
- Each training sample in the set of training samples includes a sentence including a target word with respect to the grammatical error type and an actual classification of the target word with respect to the grammatical error type.
- a first set of parameters associated with the recurrent neural networks and a second set of parameters associated with the feedforward neural network are jointly trainedbased, at least in part, on differences between the actual classifications and estimated classifications of the target words in each training sample.
- a system for grammatical error detection includes a memory and at least one processor coupled to the memory.
- the at least one processor is configured to receive a sentence and identify one or more target words in the sentence based, at least in part, on one or more grammatical error types. Each of the one or more target words corresponds to at least one of the one or more grammatical error types.
- the at least one processor is further configured to, for at least one of the one or more target words, estimate a classification of the target word with respect to the corresponding grammatical error type using an artificial neural network model trained for the grammatical error type.
- the model includes two recurrent neural networks configured to generate a context vector of the target word based, at least in part, on at least one word before the target word and at least one word after the target word in the sentence.
- the model further includes a feedforward neural network configured to output a classification value of the target word with respect to the grammatical error type based, at least in part, on the context vector of the target word.
- the at least one processor is further configured todetect a grammatical error in the sentence based, at least in part, on the target word and the estimated classification of the target word.
- a system for grammatical error detection includes a memory and at least one processor coupled to the memory.
- the at least one processor is configured to provide an artificial neural network model for estimating a classification of a target word in a sentence with respect to a grammatical error type.
- the model includes two recurrent neural networks configured to output a context vector of the target word based, at least in part, on at least one word before the target word and at least one word after the target word in the sentence.
- the model further includes a feedforward neural network configured to output a classification value of the target word based, at least in part, on the context vector of the target word.
- the at least one processor is further configured toobtain a set of training samples.
- Each training sample in the set of training samples includes a sentence including a target word with respect to the grammatical error type and an actual classification of the target word with respect to the grammatical error type.
- the at least one processor is further configured to jointly adjust a first set of parameters associated with the recurrent neural networks and a second set of parameters associated with the feedforward neural network based, at least in part, on differences between the actual classifications and estimated classifications of the target words in each training sample.
- a software product in accord with this concept, includes at least one computer-readable, non-transitory device and information carried by the device.
- the information carried by the device may be executable instructions regarding parameters in association with a request or operational parameters.
- a tangible computer-readable and non-transitory device having instructions recorded thereon for grammatical error detection, wherein the instructions, when executed by the computer, cause the computerto perform a series of operations.
- Asentence is received.
- One or more target words in the sentence are identifiedbased, at least in part, on one or more grammatical error types.
- Each of the one or more target words corresponds to at least one of the one or more grammatical error types.
- aclassification of the target word with respect to the corresponding grammatical error type is estimatedusing an artificial neural network model trained for the grammatical error type.
- the model includes two recurrent neural networks configured to output a context vector of the target word based, at least in part, on at least one word before the target word and at least one word after the target word in the sentence.
- the model further includes a feedforward neural network configured to output a classification value of the target word with respect to the grammatical error type based, at least in part, on the context vector of the target word.
- a grammatical error in the sentence is detected based, at least in part, on the target word and the estimated classification of the target word.
- a tangible computer-readable and non-transitory device having instructions recorded thereon for training an artificial neural network model, wherein the instructions, when executed by the computer, cause the computer to perform a series of operations.
- An artificial neural network model for estimating a classification of a target word in a sentence with respect to a grammatical error type is provided.
- the model includes two recurrent neural networks configured to output a context vector of the target word based, at least in part, on at least one word before the target word and at least one word after the target word in the sentence.
- the model further includes a feedforward neural network configured to output a classification value of the target word based, at least in part, on the context vector of the target word.
- a set of training samples are obtained.
- Each training sample in the set of training samples includes a sentence including a target word with respect to the grammatical error type and an actual classification of the target word with respect to the grammatical error type.
- a first set of parameters associated with the recurrent neural networks and a second set of parameters associated with the feedforward neural network are jointly trainedbased, at least in part, on differences between the actual classifications and estimated classifications of the target words in each training sample.
- FIG. 1 is a block diagram illustrating a grammatical error correction (GEC) system in accordance with an embodiment
- FIG. 2 is a depiction of an example of automated grammatical error correction performed by the system in FIG. 1;
- FIG. 3 is a flow chart illustrating an example of a method for grammatical error correctionin accordance with an embodiment
- FIG. 4 is a block diagram illustrating an example of a classification-based GEC module of the system in FIG. 1 in accordance with an embodiment
- FIG. 5 is a depiction of an example of providing a classification of a target word in a sentence using the system in FIG. 1 in accordance with an embodiment
- FIG. 6 is a schematic diagram illustrating an example of an artificial neural network (ANN) model for grammatical error correctionin accordance with an embodiment
- FIG. 7 is a schematic diagram illustrating another example of an ANN model for grammatical error correction in accordance with an embodiment
- FIG. 8 is a detailed schematic diagram illustrating an example of the ANN model in FIG. 6 in accordance with an embodiment
- FIG. 9 is a flow chart illustrating an example of a method for grammatical error correctionof a sentence in accordance with an embodiment
- FIG. 10 is a flow chart illustrating an example of a method for classifying a target word with respect to a grammatical error type in accordance with an embodiment
- FIG. 11 is a flow chart illustrating another example of a method for classifying a target word with respect to a grammatical error type in accordance with an embodiment
- FIG. 12 is a flow chart illustrating an example of a method for providing a grammarscore in accordance with an embodiment
- FIG. 13 is a block diagram illustrating anANN model training system in accordance with an embodiment
- FIG. 14 is a depiction of an example of a training sample used by the system in FIG. 13;
- FIG. 15 is a flow chart illustrating an example of a method for ANN model training for grammatical error correction in accordance with an embodiment
- FIG. 16 is a schematic diagram illustrating an example of trainingan ANN model for grammatical error correction in accordance with an embodiment
- FIG. 17 is a block diagram illustrating an example of a computer system useful for implementing various embodiments set forth in the disclosure.
- terms, such as “a, ” “an, ” or “the, ” again, may be understood to convey a singular usage or to convey a plural usage, depending at least in part upon context.
- the term “based on” may be understood as not necessarily intended to convey an exclusive set of factors and may, instead, allow for existence of additional factors not necessarily expressly described, again, depending at least in part on context.
- the automated GEC systems and methods disclosed herein provide the ability to efficiently and effectively detect and correct grammatical errors using a deep context model that can be trained from native text data.
- the error correction task can be treated as a classification problem where the grammatical context representation can be learnt from native text data that is largely available.
- the systems and methods disclosed herein do not require sophisticated feature engineering, which usually requires linguistic knowledge and may not cover all context patterns.
- the systems and methods disclosed herein can use deep features directly, such as recurrent neural networks to represent context.
- the systems and methods disclosed herein can leverage the abundant native plain text corpora and learn context representation and classification jointly in an end-to-end fashionto correct grammatical errors effectively.
- FIG. 1 is a block diagram illustrating a GEC system 100 in accordance with an embodiment.
- GEC system 100 includes an input pre-processing module 102, a parsing module 104, a target word dispatching module 106, and a plurality of classification-based GEC modules 108, each of which is configured to perform classification-based grammatical error detection and correction using deep context.
- GEC system 100 may be implemented using a pipeline architecture to combine other GEC methods, such as machine translation and predefined rule-based methods, with the classification-based method to further improve the performance of GEC system 100.
- GEC system 100 may further include a machine translation-based GEC module 110, a rule-based GEC module 112, and a scoring/correction module 114.
- Input pre-processing module 102 is configured to receive an input text 116 and pre-process input text 116.
- Input text 116 may include at least one English sentence, for example, a single sentence, a paragraph, an article, or any text corpus.
- Input text 116 may be received directly, for example, via hand writing, typing, or copying/pasting.
- Input text 116 may be received indirectly as well, for example, via speech recognition or image recognition.
- any suitable speech recognition techniques may be used to convert voice input into input text 116.
- any suitable optical character recognition (OCR) techniques may be used to transfer text contained in images into input text 116.
- Input pre-processing module 102 may pre-process input text 116 in various manners. In some embodiments, as grammatical errors are usually analyzed in the context of a particular sentence, input pre-processing module 102 may divide input text 116 into sentences so that each sentence can be treated as a unit for the later process. Partitioning input text 116 into sentences may be performed by recognizing the beginning and/or end of a sentence. For example, input pre-processing module 102 may search for certain punctuations, such as period, semicolon, question mark, or exclamation mark, as the indicators of the end of a sentence. Input pre-processing module 102 may also search for a word with the first letter capitalized as the indicator of the start of a sentence.
- punctuations such as period, semicolon, question mark, or exclamation mark
- input pre-processing module 102 may lowercase input text 116 for ease of the later process, for example, by converting any uppercase letters in input text 116 to lowercase letters.
- input pre-processing module 102 may also check the tokens (words, phrases, or any text strings) in input text 116 against a vocabulary database 118 to determine any tokens that are not in vocabulary database 118.
- the unmatched tokens may be treadedas special tokens, e.g., single unk tokens (unknown tokens) .
- Vocabulary database 118 includes all the words that can be processed by GEC system 100. Any words or other tokens that are not in vocabulary database 118 may be ignored or treated differently by GEC system 100.
- Parsing module 104 is configured to parse input text 116 to identify one or more target words in each sentence of input text 116. Different from known systems that consider all the grammatical errors unified and attempt to translate incorrect text into correct text, GEC system 100 uses models trained for each specific grammatical error type as described below in detail. Thus, in some embodiments, parsing module 104 may identify the target words from the text tokens in each sentence based on predefined grammatical error types so that each target word corresponds to at least one of the grammatical error types.
- the grammatical error types include, but not limited to, article error, subjective agreement error, verb form error, preposition error, and noun number error.
- parsing module 104 may tokenize each sentence and identify the target words from the tokens in conjunction with vocabulary database 118, which includes vocabulary information and knowledge known to GEC system 100.
- parsing module 104 may extract the non-third person singular present words and third person singular present words map relationships in advance. Parsing module 104 then may locate the verbs as the target words. For the article error, parsing module 104 may locate the nouns and noun phrases (combinations of noun words and adjective words) as the target words. For the verb form error, parsing module 104 may locate the verbs as the target words, which are in the base forms, gerund or present participles, or past participles. Regarding the preposition error, parsing module 104 may locate the prepositions as the target words. As to the noun number error, parsing module 104 may locate the nouns as the target words.
- one word may be identified by parsing module 104 as corresponding to multiple grammatical error types.
- a verb may be identified as the target word with respect to the subjective agreement error and the verb form error
- a noun or noun phrase may be identified as the target word with respect to the article error and the noun number error.
- a target word may include a phrase that is a combination of multiple words, such as a noun phrase.
- parsing module 104 may be configured to determine the actual classification of each target word. Parsing module 104 may assign an original label to each target word with respect to the corresponding grammatical error type as the actual classification valueof the target word. For example, for the subjective agreement error, the actual classification of a verb is either third person singular present form or base form. Parsing module 104 may assign the target word with the original label, for example “1” if the target word is in the third person singular present form or “0” if the target word is in the base form. For the article error, the actual classifications of the target words may be “a/an, ” “the, ” or “no article.
- Parsing module 104 may check the article in front of the target word (anoun word or noun phrase) to determine the actual classification of each target word.
- the actual classifications of the target words e.g., verbs
- the preposition error the most often used prepositions may be used by parsing module 104 as the actual classifications.
- the actual classifications include 11 original labels: “about, ” “at, ” “by, ” “for, ” “from, ” “in, ” “of, ” “on, ” “to, ” “until, ” “with, ” and “against. ”
- the actual classifications of the target words may be singular form or plural form.
- parsing module 104 may determine the original label of each target word with respect to the corresponding grammatical error type based on the part of speech (PoS) tags in conjunction with vocabulary database 118.
- PoS part of speech
- Target word dispatching module 106 is configured to dispatch each target word to classification-based GEC module 108 for the corresponding grammatical error type.
- an ANN model 120 is independently trained and used by corresponding classification-based GEC module 108.
- each classification-based GEC module 108 is associated with one specific grammatical error type and is configured to handle the target words with respect to the same grammatical error type. For example, for a target word that is a preposition (with respect to the preposition error type) , target word dispatching module 106 may send the preposition to classification-based GEC module 108 that handles preposition errors.
- target word dispatching module 106 may send the same target word to multiple classification-based GEC modules 108. It is also to be appreciated that in some embodiments, the resources assigned by GEC system 100 to each classification-based GEC module 108 may not be equal. For example, depending on the frequency in which each grammatical error type occurred within a certain user cohort or for a particular user, target word dispatching module 106 may dispatch the target words with respect to the most-frequently occurred grammatical error type with the highest priority.
- target word dispatching module 106 may schedule the processing of each target word in each sentence in an optimal manner in view of the workload of each classification-based GEC module 108 to reduce latency.
- Each classification-based GEC module 108 includes corresponding ANN model 120 that has been trained for the corresponding grammatical error type.
- Classification-based GEC module 108 is configured to estimate a classification of the target word with respect to the corresponding grammatical error type using corresponding ANN model 120.
- ANN model 120 includes two recurrent neural networks configured to output a context vector of the target word based on at least one word before the target word and at least one word after the target word in the sentence.
- ANN model 120 further includes a feedforward neural network configured to output a classification value of the target word with respect to the grammatical error type based on the context vector of the target word.
- Classification-based GEC module 108 is further configured to detect a grammatical error in the sentence based on the target word and the estimated classification of the target word. As described above, in some embodiments, the actual classification of each target word may be determined by parsing module 104. Classification-based GEC module 108 then may compare the estimated classification of the target word with the actual classification of the target word, and detect the grammatical error in the sentence when the actual classification does not match the estimated classification of the target word. For example, for a certain grammatical error type, corresponding ANN model 120 may learn an embedding function of variable-length context surrounding the target word, and corresponding classification-based GEC module 108 may predict the classification of the target word with the context embedding. If the predicted classification label is different from the original label of the target word, the target word may be flagged as an error, and the prediction may be used as correction.
- multiple classification-based GEC modules 108 may be applied in parallel in GEC system 100 to concurrently detect grammatical errors for various grammatical error types.
- the resources of GEC system 100 maybe assigned to different grammatical error types based on the occurrence frequencies of each grammatical error type. For example, more computational resources may be allocated by GEC system 100 to handle grammatical error types that occur more frequently than others. The allocation of resources may be dynamically adjusted in view of the frequency change and/or the workload of each classification-based GEC module 108.
- Machine translation-based GEC module 110 is configured to detect one or more grammatical errors in each sentence based on statistical machine translation, such as phrase-based machine translation, neural network-based machine translation, etc.
- machine translation-based GEC module 110 includes a model having a language sub-model assigning a probability for a sentence and a translation sub-model assigning a conditional probability.
- the language sub-model may be trained using a monolingual training data set in the target language.
- the parameters of the translation sub-model may be estimated from a parallel training data set, i.e., the set of foreign sentences and their corresponding translations into the target language.
- machine translation-based GEC module 110 may be applied to the output of classification-based GEC modules 108, or classification-based GEC modules 108 may be applied to the output of machine translation-based GEC module 110. Also, in some embodiments, by adding machine translation-based GEC module 110 into the pipeline, certain classification-based GEC modules 108 that may be outperformed by machine translation-based GEC module 110 may not be included in the pipeline.
- Rule-based GEC module 112 is configured to detect one or more grammatical errors in each sentence based on predefined rules. It is to be appreciated that the position of rule-based GEC module 112 in thepipeline is not limited to the end as shown in FIG. 1, but can be at the beginning of the pipeline as the first detection module or between classification-based GEC modules 108 and machine translation-based GEC module 110. In some embodiments, other mechanical errors, such as punctuations, spellings, and capitalization errors, can be detected and fixed using predefined rules by rule-based GEC module 112 as well.
- Scoring/correction module 114 is configured to provide a corrected text and/or grammar score 122 of input text 116 based on the grammatical error results received from the pipeline. Taking classification-based GEC modules 108 for example, for each target word that is detected as having a grammatical error because the estimated classification does not match the actual classification, the grammatical error correction of the target word may be provided by scoring/correction module 114 based on the estimated classification of the target word. To evaluate input text 116, scoring/correction module 114 may also provide grammar score 122 based on the grammatical error results received from the pipeline using a scoring function.
- the scoring function may assign weights to each grammatical error type so that grammatical errors in different types may have different levels of impact to grammar score 122. Weights may be assigned to precision and recall as the weighted factors in evaluating the grammatical error results.
- the user from whom input text 116 is provided may be considered by the scoring function as well. For example, the weights may be different for different users, or the information of the user (e.g., native language, residency, education level, historical scores, age, etc. ) may be factored into the scoring function.
- FIG. 2 is a depiction of an example of automated grammatical error correction performed by GEC system 100 in FIG. 1.
- an input text 202 includes a plurality of sentences and is received from a user identified by a user ID -1234.
- a corrected text 204 with a grammar score is provided for the user.
- the verb “adding” is identified as a target word with respect to the verb form error by GEC system 100.
- the actual classification of the target word “adding” is a gerund or present participle.
- GEC system 100 applies ANN model 120 trained for the verb form error and estimates that the classification of the target word “adding” is the base form - “add. ” As the estimated classification does not match the actual classification of the target word “adding, ” a verb form grammatical error is detected by GEC system 100, which affects the grammar score in view of the weight applied to the verb form error type and/or the personal information of the user.
- the estimated classification of the target word “adding” is also used by GEC system 100 to provide the correction “add” to replace “adding” in corrected text 204.
- the same ANN model 120 for the verb form error is used by GEC system 100 to detect and correct other verb form errors in input text 202, such as “disheart” to “dishearting.
- ANN models 120 for other grammatical error types are used by GEC system 100 to detect other types of grammatical errors.
- ANN model 120 for the preposition error is used by GEC system 100 to detect and correct preposition errors in input text 202, such as “for” to “in, ” and “to” to “on. ”
- FIG. 3 is a flow chart illustrating an example of a method 300 for grammatical error correctionin accordance with an embodiment.
- Method 300 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc. ) , software (e.g., instructions executing on a processing device) , or a combination thereof. It is to be appreciated that not all steps may be needed to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 3, as will be understood by a person of ordinary skill in the art.
- Method 300 shall be described with reference to FIG. 1. However, method 300 is not limited to that example embodiment.
- an input text is received.
- the input text includes at least one sentence.
- the input text may be received directly from, for example, writing, typing, or copying/pasting, or indirectly from, for example, speech recognition or image recognition.
- the received input text is pre-processed, such as being divided into sentences, i.e., text tokenization.
- the pre-processing may include converting uppercase letters into lowercase letters so that the input text is lowercased.
- the pre-processing may include identifying any tokens in the input text that are not in vocabulary database 118 and representing them as special tokens.
- 302 and 304 may be performed by input pre-processing module 102 of GEC system 100.
- the pre-processed input text is parsed to identify one or more target words in each sentence.
- the target words may be identified from the text tokensbased on the grammatical error types so that each target word corresponds to at least one of the grammatical error types.
- the grammatical error types include, but not limited to, the article error, subjective agreement error, verb form error, preposition error, and noun number error.
- the actual classification of each target word with respect to the corresponding grammatical error type is determined. The determination may be automatically made, for example, based on PoS tags and text tokens in the sentence.
- the target word identification and actual classification determination may be performed by NLP tools such as Stanford corenlp tools.
- 306 may be performed by parsing module 104 of GEC system 100.
- each target word is dispatched to corresponding classification-based GEC module 108.
- Each classification-based GEC module 108 includes ANN model 120 trained for a corresponding grammatical error type, for example, over native training samples.
- 308 may be performed by target word dispatching module 106 of GEC system 100.
- one or more grammatical errors in each sentence are detected using ANN models 120.
- a classification of the target word with respect to the corresponding grammatical error type may be estimated using corresponding ANN model 120.
- Agrammatical error then may be detected based on the target word and the estimated classification of the target word. For example, if the estimation is different from the original label and the probability is larger than the predefined threshold, then the grammatical error is deemed to be found.
- 310 may be performed by classification-based GEC modules 108 of GEC system 100.
- one or more grammatical errors in each sentence may be detected using machine translation.
- 312 may be performed by machine translation-based GEC module 110 of GEC system 100.
- one or more grammatical errors in each sentence may be detected based on predefined rules.
- 314 may be performed by rule-based GEC module 112 of GEC system 100.
- a pipeline architecture may be used to combine any suitable machine translation and/or predefined rule-based methods with the classification-based methods described herein to further improve the performance of GEC system 100.
- corrections to the detected grammatical errors and/or a grammar score of the input text are provided.
- a weight may be applied to each grammatical error result of target words based on the corresponding grammatical error type.
- the grammarscore of each sentence can be determined based on the grammatical error results and the target words in the sentence as well as the weights applied to each grammatical error result.
- the grammarscore may be provided based on the information associated with the user from whom the sentence is received as well.
- the corrections to the detected grammatical errors in some embodiments, the estimated classification of the target word with respect to the corresponding grammatical error type may be used to generate the correction. It is to be appreciated that the corrections and grammarscore are not necessary provided together. 316 may be performed by scoring/correction module 114 of GEC system 100.
- FIG. 4 is a block diagram illustrating an example of classification-based GEC module 108 of GEC system 100 in FIG. 1 in accordance with an embodiment.
- classification-based GEC module 108 is configured to receive a target word in asentence 402 and estimate the classification of the target word using ANN model 120 for the corresponding grammatical error type of the target word.
- the target word in sentence 402 is also received by a target word labeling unit 404 (e.g., in parsing module 104) .
- Target word labeling unit 404 is configured to determine the actual classification (e.g., original label) of the target word based on, for example PoS tags and text tokens of sentence 402.
- Classification-based GEC module 108 is further configured to provide the grammatical error result based on the estimated classification and actual classification of the target word. As shown in FIG. 4, classification-based GEC module 108 includes an initial context generation unit 406, a deep context representation unit 408, a classification unit 410, an attention unit 412, and a classification comparison unit 414.
- Initial context generation unit 406 is configured to generate a plurality of sets of initial context vectors (initial context matrices) of the target word based on the words surrounding the target word (context words) in sentence 402.
- the initial context vector sets include a set of forwardinitial context vectors (forward initial context matrix) generated based on at least one word before the target word (forward context words) in sentence 402 and a set of backward initial context vectors (backward initial context matrix) generated based on at least one word after the target word (backward context words) in sentence 402.
- Each initial context vector represents one context word in sentence 402.
- an initial context vector may be a one-hot vector that represents a word based on one-hot encoding so that the size (dimension) of the one-hot vector is the same as the vocabulary size (e.g., in vocabulary database 118) .
- an initial context vector may be a low-dimensional vector with the dimension smaller than the vocabulary size, such as a word embedding vector of a context word.
- a word embedding vector may be generated by any suitable generic word embedding approaches, such as but not limited to word2vec or Glove.
- initial context generation unit 406 may use one or more recurrent neural networks configured to output one or more sets of initial context vectors.
- the recurrent neural network (s) used by initial context generation unit 406 may be part of ANN model 120.
- the number of context words used for generating the set of forward or backward initial context vectors is not limited.
- the set of forward initial context vectors are generated based on all the words before the target word in sentence 402
- the set of backward initial context vectors are generated based on all the words after the target word in sentence 402.
- each classification-based GEC module 108 and corresponding ANN model 120 handle a specific grammatical error type, and correction of different types of grammatical errors may need dependencies from different word distances (e.g., a preposition is determined by the words near the target word, while the status of a verb can be affected by the subject far away from the verb)
- the number of context words used to generate the set of forward or backward initial context vectors i.e., window size
- the number of context words used to generate the set of forward or backward initial context vectors may be determined based on the grammatical error type associated with classification-based GEC module 108 and corresponding ANN model 120.
- an initial context vector may be generated based on the lemma of the target word itself.
- Lemma is the base form of a word (e.g., words “walk, ” “walks, ” “walked, ” “walking” all have the same lemma “walk. ” ) .
- the lemma form of the target noun word may be introduced in the form of an initial lemma context vector as extract context information because whether the target word should besingular or plural form is closely related to itself.
- the initial context vector of the lemma of the target word may be part of the set of forward initial context vectors or part of the set of backward initial context vectors.
- semantic features need to be designed and extracted manually from the sentence to generated feature vectors, which are difficult to cover all situations due to the complexity of language.
- complex feature engineering is not needed by classification-based GEC module 108 disclosed herein as the context words of the target word in sentence 402 are used directly as the initial context information (e.g., in the form of initial context vectors) , and the deep context feature representation can be learnt jointly with classification in an end-to-end fashion as described below in detail.
- a sentence consists of n words 1-n, including the target word i.
- a correspondinginitial context vector 1, 2, ..., or i-1 is generated.
- the initial context vectors 1, 2, ..., and i-1 are “forward” vectors as they are generated from the words before the target word iand are to be fed into the later stage in aforward direction (i.e., from the beginning of the sentence, i.e., the first word 1) .
- a correspondinginitial context vector i+1, i+2, ..., or n is generated.
- the initial context vectors n, ..., i+2, and i+1 are “backward” vectors as they are generated from the words after the target word i and are to be fed into the later stage in a backward direction (i.e., from the end of the sentence, i.e., the last word n) .
- the set of forward initial context vectors may be represented as a forwardinitial context matrix having the number of columns the same as the dimension of word embedding and the number of rows the same as the number of words before the target word i.
- the first row in the forwardinitial context matrix may be the word embedding vector of the first word 1
- the last row in the forward initial context matrix maybe the word embedding vector of the word i-1 immediately before the target word i.
- the set of backward initial context vectors may be represented as a backward initial context matrix having the number of columns the same as the dimension of word embedding and the number of rows the same as the number of words after the target word i.
- the first row in the backward initial context matrix may be the word embedding vector of the last word n, and the last row in the backward initial context matrix maybe the word embedding vector of the word i+1 immediatelyafter the target word i.
- the number of dimension of each word embedding vector may be at least 100, for example, 300.
- a lemma initial context vector lem e.g., a word embedding vector
- deep context representation unit 408 is configured to provide, using ANN model 120, a context vector of the target word based on the context words in sentence 402, for example, the sets of forward and backward initial context vectors generated by initial context generation unit 406.
- Classification unit 410 is configured to provide, using ANN mode 120, a classification value of the target word with respect to the grammatical error type based on the deep context representation of the target word in sentence 402, for example, the context vector generated by deep context representation unit 408.
- ANN model 120 includes a deep context representation sub-model 602 that can be used by deep context representation unit 408 and a classification sub-model 604that can be used by classification unit 410.
- Deep context representation sub-model 602 and classification sub-model 604 may be jointly trained in an end-to-end fashion.
- Deep context representation sub-model 602 includes two recurrent neuralnetworks: a forward recurrent neural network 606 and a backward recurrent neural network 608.
- Each recurrent neural network 606 or 608 may be a long short-term memory (LSTM) neural network, a gated recurrent unit (GRU) neural network, or any other suitable recurrent neural networks where connections between the hidden units form a directed cycle.
- LSTM long short-term memory
- GRU gated recurrent unit
- Recurrent neural networks 606 and 608 are configured to output a context vector of the target word based on the initial context vectors generated from the context words of the target word in sentence 402.
- forward recurrent neural network 606 is configured to receive the set of forward initial context vectors and provide a forward context vector of the target word based on the set of forwardinitial context vectors.
- Forward recurrent neural network 606 may be fed with the set of forwardinitial context vectors in theforward direction.
- Backward recurrent neural network 608 is configured to receive the set of backward initial context vectors and provide a backward context vector of the target word based on the set of backward initial context vectors.
- Backward recurrent neural network 608 may be fed with the set of backwardinitial context vectors in thebackwarddirection.
- the sets of forward and backward initial context vectors may be word embedding vectors as described above. It is to be appreciated that, in some embodiments, the lemma initial context vector of the target word may be fed into forward recurrent neural network 606 and/or backward recurrent neural network 608 to generate the forward context vector and/or backward context vector.
- the forward recurrent neural network is fed with the set of forward initial context vectors (e.g., in the form of forward initial context matrix) in the forward direction and generates a forward context vector for.
- the backward recurrent neural network is fed with the set of backward initial context vectors (e.g., in the form of backward initial context matrix) in the backward direction and generates a backward context vector back.
- the lemma initial context vector lem may be fed into the forward recurrent neural network and/or back recurrent neural network.
- the number of hidden units in each of the forward and backward recurrent neural networks is at least 300, for example, 600.
- a deep context vector i of the target word i is then generated by concatenating the forward context vector forand the backward context vector back.
- the deep context vector i represents the deep context information of the target word i based on the context words 1 to i-1 andcontext wordsi+1 to nsurrounding the target word i (and the lemma of the target word i in some embodiments) .
- the deep context vector i may be considered as the embedding of the joint sentential context around the target word i.
- the deep context vector i is a genericrepresentation that can handle various situations as no complex feature engineering is needed to manually design and extract semantic features for representing the context of the target word i.
- classification sub-model 604 includes a feedforward neural network 610 configured to output the classification value of the target word with respect to the grammatical error type based on the context vector of the target word.
- Feedforward neural network 610 may include a multi-layerperceptron (MLP) neural network or any other suitable feedforward neural networks where connections between the hidden units do not form a cycle.
- MLP multi-layerperceptron
- the deep context vector i is fed into the feedforward neural network to generate the classification value y of the target word i.
- the classification value y can be defined in different ways as shown in TABLE I.
- the grammatical error type is not limited to the five examples in TABLE I, and the definition of the classification value y is also not limited by the examples shown in TABLE I. It is also to be appreciated that in some embodiments, the classification value y may be represented as a probability distribution of the target word over the classes (labels) associated with the grammatical error type.
- feedforward neural network 610 may include a first layer having a first activation function of a fully connected linear operation on the context vector.
- the first activation function in the first layer may be, for example, the rectified linear unit activation function, or any other suitable activation functions that are functions of one fold output from the previous layer (s) .
- Feedforward neural network 610 may also include a second layer connected to the first layer and having a second activation function for generating the classification value.
- the second activation function in the second layer may be, for example, the softmax activation function, or any other suitable activation functions used for multiclass classification.
- attention unit 412 is configured to provide, using ANN model 120, a context weight vector of the target word based on at least one word before the target word and at least one word after the target word in sentence 402.
- FIG. 7 is a schematic diagram illustrating another example of ANN model 120 for grammatical error correction in accordance with an embodiment. Compared with the example shown in FIG. 6, ANN model 120 in FIG. 7 further includes an attention mechanism sub-model 702 that can be used by attention unit 412. The weighted context vector is then computed by applying the context weight vector to the context vector. Deep context representation sub-model 602, classification sub-model 604, and attention mechanism sub-model 702 may be jointly trained in an end-to-end fashion.
- attention mechanism sub-model 702 includes a feedforward neural network 704 configured to generate the context weight vector of the target word based on the context words of the target word.
- Feedforward neural network 704 may be trained based on the distances between each context word to the target word in the sentence.
- the sets of initial context vectors can be generated based on all the surrounding words in the sentence, and the context weight vector can tune the weighted context vector to focus on those context words that affect grammatical usage.
- classification comparison unit 414 is configured to compare the estimated classification value provided by classification unit 410 with the actualclassification value provided by target word labeling unit 404 to detect the presence of any error of the grammatical error type. If the actual classification value is the same as the estimated classification value, then no error of the grammatical error type is detected for the target word. Otherwise, an error of the grammatical error type is detected, and the estimated classification value is used to provide the correction.
- the estimatedclassification value of the target word “adding” with respect to the verb form error is “0” (base form)
- the actual classification value of the target word “adding” is “1” (gerund or present participle) .
- a verb form error is detected, and the correction is the base form of the target word “adding. ”
- FIG. 8 is a detailed schematic diagram illustrating an example of ANN model 120 in FIG. 6 in accordance with an embodiment.
- ANN model 120 includes a forward GRU neural network, a backward GRU neural network, and a MLP neutral network that are jointly trained.
- the forward context word “I” is fed to the forward GRU neural network from left to right (the forward direction)
- the backward context words “to school everyday” are fed to the backward GRU neural network from right to left (the backward direction) .
- the context vector for the target word w i can be defined as Equation 1:
- lGRU is a GRU reading the words from left to right (the forward direction) in a given context
- rGRU is a reverse on reading the words from right to left (the backward direction)
- l/f represents distinct left-to-right/right-to-left word embedding of the context words.
- the concatenated vector is fed to the MLP neural network to capture the inter-dependencies of the two sides.
- a softmax layer may be used to predict the classification of the target word (e.g., the target word or the status of the target word, e.g., singular or plural) :
- FIG. 9 is a flow chart illustrating an example of a method 900 for grammatical error correctionof a sentence in accordance with an embodiment.
- Method 900 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc. ) , software (e.g., instructions executing on a processing device) , or a combination thereof. It is to be appreciated that not all steps may be needed to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 9, as will be understood by a person of ordinary skill in the art.
- Method 900 shall be described with reference to FIGs. 1 and 4. However, method 900 is not limited to that example embodiment.
- a sentence is received.
- the sentence may be part of an input text.
- 902 may be performed by input pre-processing module 102 of GEC system 100.
- one or more target words in the sentence are identified based on one or more grammatical error types. Each target word corresponds to one or more grammatical error types.
- 904 may be performed by parsing module 104 of GEC system 100.
- a classification of one target word with respect to the corresponding grammatical error type is estimated using ANN model 120 trained for the grammatical error type.
- a grammatical error is detected based on the target word and the estimated classification of the target word.
- the detection may be made by comparing the actualclassification of the target word with the estimatedclassification of the target word.
- 906 and 908 may be performed by classification-based GEC module 108 of GEC system 100.
- method 900 moves back to 904 to process the next target word in the sentence.
- grammatical error corrections to the sentence are provided based on the grammatical error result.
- the estimated classifications of each target word may be used for generating the grammatical error corrections.
- a grammar score may be provided based on the grammatical error result as well.
- 912 may be performed by scoring/correction module 114 of GEC system 100.
- FIG. 10 is a flow chart illustrating an example of a method 1000for classifying a target word with respect to a grammatical error type in accordance with an embodiment.
- Method 1000 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc. ) , software (e.g., instructions executing on a processing device) , or a combination thereof. It is to be appreciated that not all steps may be needed to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 10, as will be understood by a person of ordinary skill in the art.
- Method 1000 shall be described with reference to FIGs. 1 and 4. However, method 1000 is not limited to that example embodiment.
- acontext vector of a target word is provided based on the context words in the sentence.
- the context words may be any number of words surrounding the target word in the sentence.
- the context words include all the words in the sentence except the target word.
- the context words include the lemma of the target word as well.
- the context vector does not include semantic features extracted from the sentence.
- 1002 may be performed by deep context representation unit 408 of classification-based GEC module 108.
- a context weight vector is provided based on the context words in the sentence.
- the context weight vector is applied to the context vector to generate a weighted context vector.
- the context weight vector may apply a respective weight to each context word in the sentence based on the distance of the context word to the target word.
- 1004 and 1006 may be performed by attention unit 412 of classification-based GEC module 108.
- a classification value of the target word with respect to the grammatical error type is provided based on the weighed context vector of the target word.
- the classification value represents one of the multiple classes associated with a grammatical error type.
- the classification value may be a probability distribution of the target word over the classes associated with the grammatical error type. 1008 may be performed by classification unit 410 of classification-based GEC module 108.
- FIG. 11 is a flow chart illustrating another example of a method 1100 for classifying a target word with respect to a grammatical error type in accordance with an embodiment.
- Method 1100 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc. ) , software (e.g., instructions executing on a processing device) , or a combination thereof. It is to be appreciated that not all steps may be needed to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 11, as will be understood by a person of ordinary skill in the art.
- the grammatical error type of a target word is determined, for example, from a plurality of predefined grammatical error types.
- the window size of the context words is determined based on the grammatical error type.
- the window size indicates the maximum number of words before the target word and the maximum number of words after the target word in the sentence to be considered as the context words.
- the window size may vary for different grammatical error types. For example, for the subjective agreement and verb form errors, the entire sentence may be consideredas the context since these two error types usually require dependencies from context words that are far away from the target word.
- the window size may be smaller than the entire sentence, such as 3, 5, or 10 for the article error, 3, 5, or 10 for the preposition error, and 10, 15, or 20 for the noun number error.
- a set of forward word embedding vectors are generated based on the context words before the target word.
- the number of dimension of each forward word embedding vector may be at least 100, such as 300.
- the order in which the set of forward word embedding vectors are generated may be from the first word within the window size to the word immediately before the target word (the forward direction) .
- a set of backward word embedding vectors are generated based on the context words after the target word.
- the number of dimension of each backward word embedding vector may be at least 100, such as 300.
- the order in which the set of backward word embedding vectors are generated may be from the last word within the window size to the word immediatelyafter the target word (the backward direction) .
- 1102, 1104, 1106, and 1108 may be performed by initial context generation unit 406 of classification-based GEC module 108.
- a forward context vector is provided based on the set of forward word embedding vectors.
- the set of forward word embedding vectors may be fed to a recurrent neural network following the order from the forward word embedding vector of the first word within the window size to the forward word embedding vector of the word immediately before the target word (the forward direction) .
- a backward context vector is provided based on the set of backward word embedding vectors.
- the set of backward word embedding vectors may be fed to another recurrent neural network following the order from the backward word embedding vector of the last word within the window size to the backward word embedding vector of the word immediatelyafter the target word the (backward direction) .
- a context vector is provided by concatenating the forward context vector and the backward context vector. 1110, 1112, and 1114 may be performed by deep context representation unit 408 of classification-based GEC module 108.
- a fully connection linear cooperation is applied to the context vector.
- an activation function of a first layer for example of a MLP neural network, is applied to the output of the fully connected linear operation.
- the activation function may be the rectified linear unit activation function.
- another activation function of a second layer for example of the MLP neural network, is applied to the output of the activation function of the first layer to generate a classificationvalue of the target word with respect to the grammatical error type.
- Multiclass classification of the target word with respect to the grammatical error type may be performed based on the context vector by the MLP neural network in 1116, 1118, and 1120.
- 1116, 1118, and 1120 may be performed by classification unit 410 of classification-based GEC module 108.
- FIG. 12 is a flow chart illustrating an example of a method 1200 for providing a grammarscore in accordance with an embodiment.
- Method 1200 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc. ) , software (e.g., instructions executing on a processing device) , or a combination thereof. It is to be appreciated that not all steps may be needed to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 12, as will be understood by a person of ordinary skill in the art.
- auser factor is determined based on information of the user.
- the information includes, for example, native language, residency, education level, age, historical scores, etc.
- weights of precision and recall are determined. Precision and recall are commonly used in combination as the main evaluation measure for GEC.
- the precision P and recall R are defined as follows:
- g is the gold standards of two human annotators for specific grammatical error type and e is the corresponding system edits. There may be overlaps between many other grammatical error types and the verb form error type, so g may be based on the annotations of all grammatical error types when calculatingthe verb form error performance. Weights between precision and recall may be adjusted when combining them together as the evaluation measure. For example, F 0.5, defined in Equation 5, combines both precision and recall, while assigning twice as much weight to precision when accurate feedback is more important than coverage in some embodiments.
- Fn wherein n is between 0 to 1 may be applied in other examples.
- the weights for different grammatical error types may vary as well.
- a scoring function is obtained based on the user factor and the weights.
- the scoring function may use the user factor and weights (either the same or different for different grammatical error types) as parameters.
- the grammatical error results of each target word in the sentence are received.
- a grammar score is provided based on the grammatical error results and the scoring function.
- Grammatical error results may be the variables of the scoring function, and the user factor and weights may be the parameters of the scoring function. 1202, 1204, 1206, 1208, and 1210 may be performed by scoring/correction module 114 of GEC system 100.
- FIG. 13 is a block diagram illustrating anANN model training system 1300 in accordance with an embodiment.
- ANN model training system 1300 includes a model training module 1302 configured to train each ANN model 120 for a specific grammatical error type over a set of training samples 1304 based on an objective function 1306 using a trainingalgorithm 1308.
- each training sample 1304 may be a native training sample.
- a native training sample as disclosed herein includes a sentence without a grammatical error, as opposed to a learner training sample that includes a sentence with one or more grammatical errors.
- ANN model training system 1300 can utilize the abundant native plain text corpora as training samples 1304 to more effectively and efficiently train ANN model 120.
- training samples 1304 may be obtained from the wiki dump.
- training samples 1304 for ANN model training system 1300 are not limited to native training samples.
- ANN model training system 1300 may train ANN model 120 using learner training samples or the combination of native training samples and learner training samples.
- FIG. 14 is a depiction of an example of a training sample 1304 used by ANN model training system 1300in FIG. 13.
- a training sample includes a sentence that is associated with one or more grammatical error types 1, ..., n.
- the training sample may be a native training sample without a grammatical error, the sentence can still be associated with grammatical error types because as described above, a particular word is associated with one or more grammatical error types, for example, based on its PoS tag.
- the sentence may be associated with for example, the verb form and subjective agreement errors.
- One or more target words 1, ..., m may be associated with each grammatical error type.
- all the verbs in a sentence are target words with respect to the verb form or subjective agreement error in a training sample.
- the word embedding vector set (matrix) x For each target word, it is further associated with two pieces of information: the word embedding vector set (matrix) x, and the actual classification value y.
- the word embedding vector set x may be generated based on the context words of the target word in the sentence. It is to be appreciated that in some embodiments, the word embedding vector set xmay be any other initial context vector set, such as one-hot vector set.
- the actual classification value y may beone of the class labels with respect to a specific grammatical error type, such as “0” for singular and “1” for plural with respect to the noun number error.
- the training sample thus includes a word embedding vector set x and an actual classification value y pairs, each of which corresponds to a target word with respect to a grammatical error type in the sentence.
- ANN model 120 includes a plurality of parameters that can be jointly adjusted by model training module 1302 when being fed with training samples 1304.
- Model training module 1302 jointly adjusts the parameters of ANN model 120 to minimize objective function 1306 over training samples 1304 using training algorithm 1308.
- the objective function for training ANN model 120 is:
- Training algorithm 1308 may be any suitable iterative optimization algorithm for finding the minimum of objective function 1306, including gradient descent algorithms (e.g., thestochastic gradient descent algorithm) .
- FIG. 15 is a flow chart illustrating an example of a method 1500 for ANN model training for grammatical error correction in accordance with an embodiment.
- Method 1500 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc. ) , software (e.g., instructions executing on a processing device) , or a combination thereof. It is to be appreciated that not all steps may be needed to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 15, as will be understood by a person of ordinary skill in the art.
- Method 1500 shall be described with reference to FIG. 13. However, method 1500 is not limited to that example embodiment.
- an ANN model for a grammatical error type is provided.
- the ANN model is for estimating a classification of a target word in a sentence with respect to the grammatical error type.
- the ANN model may be any ANN models disclosed herein, for examples, the ones illustrated in FIGs 6 and 7.
- the ANN models may include two recurrent neural networks configured to output a context vector of the target word based on at least one word before the target word and at least one word after the target word in the sentence.
- the context vector does not include a semantic feature of the sentence in the training sample.
- the ANN model may include deep context representation sub-model 602 that can be parameterized as forward recurrent neural network 606 and backward recurrent neural network 608.
- the ANN model may also include a feedforward neural network configured to output a classification value of the target word based on the context vector of the target word.
- the ANN model may include classification sub-model 604 that can be parameterized as feedforward neural network 610.
- a training sample set is obtained.
- Each training sample includes a sentence having a target word and an actual classification of the target word with respect to the grammatical error type.
- the training sample may include a word embedding matrix of the target word that includes a set of forward word embedding vectors and a set of backward word embedding vectors.
- Each forward word embedding vector is generated based on a respective context word before the target word, and each backward word embedding vector is generated based on a respective context word after the target word.
- the number of dimension of each word embedding vector may be at least 100, such as 300.
- the parameters of the ANN model are jointly adjusted, for example, in an end-to-end fashion.
- a first set of parameters of deep context representation sub-model 602, associated with recurrent neural networks 606 and 608 are jointly adjusted with a second set of parameters of classification sub-model 604, associated with feedforward neural network 610 based on differences between the actual classifications and estimated classifications of the target words in each training sample.
- the parameters associated with forward recurrent neural network 606 are separate from the parameters associated with backward recurrent neural network 608.
- the ANN model may also include attention mechanism sub-model 702 that can be parameterized as feedforward neural network 610.
- the parameters of attention mechanism sub-model 702, associated with feedforward neural network 610, may be jointly adjusted with other parameters of the ANN model as well.
- the parameters of the ANN model are jointly adjusted to minimize the differences between the actual classifications and estimated classifications of the target words in each training sample from objective function 1306 using training algorithm 1308.1502, 1504, and 1506 may be performed by model training module 1302 of ANN model training system 1300.
- FIG. 16 is a schematic diagram illustrating an example of training ANN model 120for grammatical error correction in accordance with an embodiment.
- ANN model 120 is trained over training samples 1304 with respect to a specific grammatical error type.
- Training examples 1304 may be from native text and pre-processed and parsed as described above with respect to FIG. 1.
- Each training sample 1304 includes a sentence having a target word with respect to the grammatical error type and the actual classification of the target word with respect to the grammatical error type.
- a pair including the word embedding matrix x of the target word and the actual classification value y of the target word may be obtained for each training sample 1304.
- the word embedding matrix x may include a set of forward word embedding vectors generated based on the context words before the target word and a set of backward word embedding vectors generated based on the context words after the target word.
- Training samples 1304 thus may include a plurality of (x, y) pairs.
- ANN model 120 may include a plurality of recurrent neural networks 1-n1602 and a plurality of feedforward neural networks 1-m1604. Each of neural networks 1602 and 1604 is associated with a set of parameters to be trained over training samples 1304 based on objective function 1306 using training algorithm 1308.
- Recurrent neural networks 1602 may include a forward recurrent neural network and a backward recurrent neural network configured to output a context vector of the target word based on the context words of the target word.
- recurrent neural networks 1602 may further include another one or more recurrent neural networks configured to generate the word embedding matrix of the target word based on the context words of the target word.
- Feedforward neural networks 1604 may include a feedforward neural network configured to output a classification value y’of the target word based on the context vector of the target word. In some embodiments, feedforward neural networks 1604 may also include another feedforward neural network configured to output a context weight vector to be applied to the context vector. Neural networks 1602 and 1604may be connected so that they can be jointly trained in an end-to-end fashion. In some embodiments, the context vector does not include a semantic feature of the sentence in training sample 1304.
- the word embedding matrix x of the target word in corresponding training sample 1304 may be fed into ANN model 120, passing through neural networks 1602 and 1604.
- the estimated classification value y’ may be outputted from the output layer (e.g., part of a feedforward neural network 1604) of ANN model 120.
- the estimated classification value y’and the actual classification value y of the target word in corresponding training sample 1304 may be sent to objective function 1306, and the difference between the estimated classification value y’and the actual classification value ymay be used by objective function 1306 using training algorithm 1308 to jointly adjust each set of parameters associated with each of neural networks 1602 and 1604 in ANN model 120.
- FIG. 17 Various embodiments can be implemented, for example, using one or more computer systems, such as computer system 1700 shown in FIG. 17.
- One or more computer system 1700 can be used, for example, to implement method 300 of FIG. 3, method 900 of FIG. 9, method 1000 of FIG. 10, method 1100 of FIG. 11, method 1200 of FIG. 12, and method 1500 of FIG. 15.
- computer system 1700 can detect and correct grammatical errors and/or train an artificial neural network model for detecting and correcting grammatical errors, according to various embodiments.
- Computer system 1700 can be any computer capable of performing the functions described herein.
- Computer system 1700 can be any well-known computer capable of performing the functions described herein.
- Computer system 1700 includes one or more processors (also called central processing units, or CPUs) , such as a processor 1704.
- processors 1704 is connected to a communication infrastructure or bus 1706.
- One or more processors 1704 may each be a graphics processing unit (GPU) .
- a GPU is a processor that is a specialized electronic circuit designed to process mathematically intensive applications.
- the GPU may have a parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, videos, etc.
- Computer system 1700 also includes user input/output device (s) 1703, such as monitors, keyboards, pointing devices, etc., that communicate with communication infrastructure 1706 through user input/output interface (s) 1702.
- user input/output device (s) 1703 such as monitors, keyboards, pointing devices, etc.
- Computer system 1700 also includes a main or primary memory 1708, such as random access memory (RAM) .
- Main memory 1708 may include one or more levels of cache.
- Main memory 1708 has stored therein control logic (i.e., computer software) and/or data.
- Computer system 1700 may also include one or more secondary storage devices or memory 1710.
- Secondary memory 1710 may include, for example, a hard disk drive 1712 and/or a removable storage device or drive 1714.
- Removable storage drive 1714 may be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup device, and/or any other storage device/drive.
- Removable storage drive 1714 may interact with a removable storage unit 1718.
- Removable storage unit 1718 includes a computer usable or readable storage device having stored thereon computer software (control logic) and/or data.
- Removable storage unit 1718 may be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/any other computer data storage device.
- Removable storage drive 1714 reads from and/or writes to removable storage unit 1718 in a well-known manner.
- secondary memory 1710 may include other means, instrumentalities or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 1700.
- Such means, instrumentalities or other approaches may include, for example, a removable storage unit 1722 and an interface 1720.
- the removable storage unit 1722 and the interface 1720 may include a program cartridge and cartridge interface (such as that found in video game devices) , a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.
- Computer system 1700 may further include a communication or network interface 1724.
- Communication interface 1724 enables computer system 1700 to communicate and interact with any combination of remote devices, remote networks, remote entities, etc. (individually and collectively referenced by reference number 1728) .
- communication interface 1724 may allow computer system 1700 to communicate with remote devices 1728 over communication path 1726, which may be wired and/or wireless, and which may include any combination of LANs, WANs, the Internet, etc. Control logic and/or data may be transmitted to and from computer system 1700 via communication path 1726.
- a tangible apparatus or article of manufacture comprising a tangible computer useable or readable medium having control logic (software) stored thereon is also referred to herein as a computer program product or program storage device.
- control logic software stored thereon
- control logic when executed by one or more data processing devices (such as computer system 1700) , causes such data processing devices to operate as described herein.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
Description
Error Type | Classification Values y |
Article | 0 = a/an, 1 = the, 2 = none |
Preposition | label = preposition index |
Verb form | 0 = base form, 1 = gerund or present participle, 2 = past participle |
Subjectiveagreement | 0 = non-3rd person singular present, 1 = 3rd person singular present |
Noun number | 0 = singular, 1 = plural |
Claims (70)
- A method for grammatical error detection, comprising:receiving, by at least one processor, a sentence;identifying, by the at least one processor, one or more target words in the sentence based, at least in part, on one or more grammatical error types, wherein each of the one or more target words corresponds to at least one of the one or more grammatical error types;for at least one of the one or more target words, estimating, by the at least one processor, a classification of the target word with respect to the corresponding grammatical error type using an artificial neural network model trained for the grammatical error type, wherein the model comprises (i) two recurrent neural networks configured to output a context vector of the target word based, at least in part, on at least one word before the target word and at least one word after the target word in the sentence, and (ii) a feedforward neural network configured to output a classification value of the target word with respect to the grammatical error type based, at least in part, on the context vector of the target word; anddetecting, by the at least one processor, a grammatical error in the sentence based, at least in part, on the target word and the estimated classification of the target word.
- The method of claim 1, the estimating further comprising:providing the context vector of the target word based, at least in part, on the at least one word before the target word and the at least one word after the target word in the sentence using the two recurrent neural networks; andproviding the classification value of the target word with respect to the grammatical error type based, at least in part, on the context vector of the target word using the feedforward neural network.
- The method of claim 2, wherein the context vector of the target word is provided based, at least in part, on a lemma of the target word.
- The method of claim 2, the estimating further comprising:generating a first set of word embedding vectors, wherein each word embedding vector in the first set of word embedding vectors is generated based, at least in part, on a respective one of the at least one word before the target word in the sentence; andgenerating a second set of word embedding vectors, wherein each word embedding vector in the second set of word embedding vectors is generated based, at least in part, on a respective one of the at least one word after the target word in the sentence.
- The method of claim 4, wherein the number of dimension of each word embedding vector is at least 100.
- The method of claim 1, wherein:the at least one word before the target word comprises all words before the target word in the sentence; andthe at least one word after the target word comprises all words after the target word in the sentence.
- The method of claim 1, wherein the number of the at least one word before the target word and/or the number of the at least one word after the target word are determined based, at least in part, on the grammatical error type.
- The method of claim 2, the estimating further comprising:providing a context weight vector of the target word based, at least in part, on the at least one word before the target word and the at least one word after the target word in the sentence; andapplying the context weight vector to the context vector.
- The method of claim 4, the providing of the context vector further comprising:providing a first context vector of the target word based, at least in part, on the first set of word embedding vectors using a first one of the two recurrent neural networks;providing a second context vector of the target word based, at least in part, on the second set of word embedding vectors using a second one of the two recurrent neural networks; andproviding the context vector by concatenating the first and second context vectors.
- The method of claim 9, wherein:the first set of word embedding vectors are provided to the first recurrent neural network starting from the word embedding vector of a word at the beginning of the sentence; andthe second set of word embedding vectors are provided to the second recurrent neural network starting from the word embedding vector of a word at the end of the sentence.
- The method of claim 1, wherein the number of hidden units in each of the two recurrent neural networks is at least 300.
- The method of claim 1, wherein the feedforward neural network comprises:a first layer having a first activation function of a fully connected linear operation on the context vector; anda second layer connected to the first layer and having a second activation function for generating the classification value.
- The method of claim 1, wherein the classification value is a probability distribution of the target word over a plurality of classes associated with the grammatical error type.
- The method of claim 1, the detecting further comprising:comparing the estimated classification of the target word with an actual classification of the target word; anddetecting the grammatical error in the sentence when the actual classification does not match the estimated classification of the target word.
- The method of claim 1, further comprising:in response to detecting the grammatical error in the sentence, providing a grammatical error correction of the target word based, at least in part, on the estimated classification of the target word.
- The method of claim 1, further comprising:for each of the one or more target words, estimating a respective classification of the target word with respect to the corresponding grammatical error type using a respective artificial neural network model trained for the grammatical error type, and comparing the estimated classification of the target word with an actual classification of the target word to generate a grammatical error result of the target word;applying a weight to each of the grammatical error results of the one or more target words based, at least in part, on the corresponding grammatical error type; andproviding a grammarscore of the sentence based on the grammatical error results of the one or more target words and the weights.
- The method of claim 16, wherein the grammarscore is provided based, at least in part, on information associated with a user from whom the sentence is received.
- The method of claim 1, wherein the model is trained by native training samples.
- The method of claim 1, wherein the two recurrent neural networks and the feedforward neural network are jointly trained.
- The method of claim 1, wherein the model further comprises:another recurrent neural network configured to output a set of initial context vectors to be inputted to the two recurrent neural networks for generating the context vector; andanother feedforward neural network configured to output a context weight vector to be applied to the context vector.
- The method of claim 20, wherein all the recurrent neural networks and feedforward neural network are jointly trained by native training samples.
- A system for grammatical error detection, comprising:a memory; andat least one processor coupled to the memory and configured to:receive a sentence;identify one or more target words in the sentence based, at least in part, on one or more grammatical error types, wherein each of the one or more target words corresponds to at least one of the one or more grammatical error types;for at least one of the one or more target words, estimate a classification of the target word with respect to the corresponding grammatical error type using an artificial neural network model trained for the grammatical error type, wherein the model comprises (i) two recurrent neural networks configured to generate a context vector of the target word based, at least in part, on at least one word before the target word and at least one word after the target word in the sentence, and (ii) a feedforward neural network configured to output a classification value of the target word with respect to the grammatical error type based, at least in part, on the context vector of the target word; anddetect a grammatical error in the sentence based, at least in part, on the target word and the estimated classification of the target word.
- The systemof claim 22, wherein to estimate a classification of the target word the at least one processor is configured to:provide the context vector of the target word based, at least in part, on the at least one word before the target word and the at least one word after the target word in the sentence using the two recurrent neural networks; andprovide the classification value of the target word with respect to the grammatical error type based, at least in part, on the context vector of the target word using the feedforward neural network.
- The systemof claim 23, wherein the context vector of the target word is provided based, at least in part, on a lemma of the target word.
- The systemof claim 23, wherein to estimate a classification of the target word, the at least one processor is configured to:generate a first set of word embedding vectors, wherein each word embedding vector in the first set of word embedding vectors is generated based, at least in part, on a respective one of the at least one word before the target word in the sentence; andgenerate a second set of word embedding vectors, wherein each word embedding vector in the second set of word embedding vectors is generated based, at least in part, on a respective one of the at least one word after the target word in the sentence.
- The systemof claim 25, wherein the number of dimension of each word embedding vector is at least 100.
- The systemof claim 22, wherein:the at least one word before the target word comprises all words before the target word in the sentence; andthe at least one word after the target word comprises all words after the target word in the sentence.
- The systemof claim 22, wherein the number of the at least one word before the target word and/or the number of the at least one word after the target word are determined based, at least in part, on the grammatical error type.
- The systemof claim 23, wherein to estimate a classification of the target word the at least one processor is configured to:provide a context weight vector of the target word based, at least in part, on the at least one word before the target word and the at least one word after the target word in the sentence; andapply the context weight vector to the context vector.
- The systemof claim 25, wherein to provide the context vector of the target word the at least one processor is configured to:providing a first context vector of the target word based, at least in part, on the first set of word embedding vectors using a first one of the two recurrent neural networks;providing a second context vector of the target word based, at least in part, on the second set of word embedding vectors using a second one of the two recurrent neural networks; andproviding the context vector by concatenating the first and second context vectors.
- The systemof claim 30, wherein:the first set of word embedding vectors are provided to the first recurrent neural network starting from the word embedding vector of a word at the beginning of the sentence; andthe second set of word embedding vectors are provided to the second recurrent neural network starting from the word embedding vector of a word at the end of the sentence.
- The systemof claim 22, wherein the number of hidden units in each of the two recurrent neural networks is at least 300.
- The systemof claim 22, wherein the feedforward neural network comprises:a first layer having a first activation function of a fully connected linear operation on the context vector; anda second layer connected to the first layer and having a second activation function for generating the classification value.
- The systemof claim 22, wherein the classification value is a probability distribution of the target word over a plurality of classes associated with the grammatical error type.
- The systemof claim 22, wherein to detect a grammatical error the at least one processor is configured to:compare the estimated classification of the target word with an actual classification of the target word; anddetect the grammatical error in the sentence when the actual classification does not match the estimated classification of the target word.
- The systemof claim 22, the at least one processor further configured to:in response to detecting the grammatical error in the sentence, provide a grammatical error correction of the target word based, at least in part, on the estimated classification of the target word.
- The systemof claim 22, the at least one processor further configured to:for each of the one or more target words, estimate a respective classification of the target word with respect to the corresponding grammatical error type using a respective artificial neural network model trained for the grammatical error type, and comparing the estimated classification of the target word with an actual classification of the target word to generate a grammatical error result of the target word;apply a weight to each of the grammatical error results of the one or more target words based, at least in part, on the corresponding grammatical error type; andprovide a grammarscore of the sentence based on the grammatical error results of the one or more target words and the weights.
- The systemof claim 37, wherein the grammarscore is provided based, at least in part, on information associated with a user from whom the sentence is received.
- The systemof claim 22, wherein the model is trained by native training samples.
- The systemof claim 22, wherein the two recurrent neural networks and the feedforward neural network are jointly trained.
- The systemof claim 22, wherein the model further comprises:another recurrent neural network configured to output a set of initial context vectors to be inputted to the two recurrent neural networks for generating the context vector; andanother feedforward neural network configured to output a context weight vector to be applied to the context vector.
- The systemof claim 41, wherein all the recurrent neural networks and feedforward neural network are jointly trained by native training samples.
- A tangible computer-readable device having instructions stored thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations comprising:receiving a sentence;identifying one or more target words in the sentence based, at least in part, on one or more grammatical error types, wherein each of the one or more target words corresponds to at least one of the one or more grammatical error types;for at least one of the one or more target words, estimating a classification of the target word with respect to the corresponding grammatical error type using an artificial neural network model trained for the grammatical error type, wherein the model comprises (i) two recurrent neural networks configured to output a context vector of the target word based, at least in part, on at least one word before the target word and at least one word after the target word in the sentence, and (ii) a feedforward neural network configured to output a classification value of the target word with respect to the grammatical error type based, at least in part, on the context vector of the target word; anddetecting a grammatical error in the sentence based, at least in part, on the target word and the estimated classification of the target word.
- A method for training an artificial neural network model, comprising:providing, by at least one processor, an artificial neural network model for estimating a classification of a target word in a sentence with respect to a grammatical error type, wherein the model comprises (i) two recurrent neural networks configured to output a context vector of the target word based, at least in part, on at least one word before the target word and at least one word after the target word in the sentence, and (ii) a feedforward neural network configured to output a classification value of the target word based, at least in part, on the context vector of the target word;obtaining, by the at least one processor, a set of training samples, wherein each training sample in the set of training samples comprises a sentence comprising a target word with respect to the grammatical error type and an actual classification of the target word with respect to the grammatical error type; andjointly adjusting, by the at least one processor, a first set of parameters associated with the recurrent neural networks and a second set of parameters associated with the feedforward neural network based, at least in part, on differences between the actual classifications and estimated classifications of the target words in each training sample.
- The method of claim 44, wherein each training sample is a native training sample without a grammatical error.
- The method of claim 44, wherein the recurrent neural networks are gated recurrent unit (GRU) neural networks, and the feedforward neural network is a multilayer perception (MLP) neural network.
- The method of claim 44, wherein the model further comprises:another feedforward neural network configured to output a context weight vector to be applied to the context vector.
- The method of claim 47, the jointly adjusting comprising:jointly adjusting the first and second sets of parameters and a third set of parameters associated with the another feedforward neural network based, at least in part, on the differences between the actual classifications and estimated classifications of the target words in each training sample.
- The method of claim 44, further comprising: for each training sample,generating a first set of word embedding vectors, wherein each word embedding vector in the first set of word embedding vectors is generated based, at least in part, on a respective one of at least one word before the target word in the training sample; andgenerating a second set of word embedding vectors, wherein each word embedding vector in the second set of word embedding vectors is generated based, at least in part, on a respective one of at least one word after the target word in the training sample.
- The method of claim 49, wherein the number of dimension of each word embedding vector is at least 100.
- The method of claim 49, wherein:the at least one word before the target word comprises all words before the target word in the sentence; andthe at least one word after the target word comprises all words after the target word in the sentence.
- The method of claim 49, further comprising: for each training sample,providing a first context vector of the target word based, at least in part, on the first set of word embedding vectors using a first one of the two recurrent neural networks;providing a second context vector of the target word based, at least in part, on the second set of word embedding vectors using a second one of the two recurrent neural networks; andproviding the context vector by concatenating the first and second context vectors.
- The method of claim 52, wherein:the first set of word embedding vectors are provided to the first recurrent neural network starting from the word embedding vector of a word at the beginning of the sentence; andthe second set of word embedding vectors are provided to the second recurrent neural network starting from the word embedding vector of a word at the end of the sentence.
- The method of claim 52, wherein the first and second context vectors do not comprise a semantic feature of the sentence in the training sample.
- The method of claim 44, wherein the number of hidden units in each of the two recurrent neural networks is at least 300.
- The method of claim 44, wherein the feedforward neural network comprises:a first layer having a first activation function of a fully connected linear operation on the context vector; anda second layer connected to the first layer and having a second activation function for generating the classification value.
- A system for training an artificial neural network model, comprising:a memory; andat least one processor coupled to the memory and configured to:provide an artificial neural network model for estimating a classification of a target word in a sentence with respect to a grammatical error type, wherein the model comprises (i) two recurrent neural networks configured to output a context vector of the target word based, at least in part, on at least one word before the target word and at least one word after the target word in the sentence, and (ii) a feedforward neural network configured to output a classification value of the target word based, at least in part, on the context vector of the target word;obtain a set of training samples, wherein each training sample in the set of training samples comprises a sentence comprising a target word with respect to the grammatical error type and an actual classification of the target word with respect to the grammatical error type; andjointly adjust a first set of parameters associated with the recurrent neural networks and a second set of parameters associated with the feedforward neural network based, at least in part, on differences between the actual classifications and estimated classifications of the target words in each training sample.
- The systemof claim 57, wherein each training sample is a native training sample without a grammatical error.
- The systemof claim 57, wherein the recurrent neural networks are GRU neural networks, and the feedforward neural network is a MLP neural network.
- The systemof claim 57, wherein the model further comprises:another feedforward neural network configured to output a context weight vector to be applied to the context vector.
- The systemof claim 60, wherein to jointly adjusta first set of parameters and a second set of parameters the at least one processor is configured to:jointly adjust the first and second sets of parameters and a third set of parameters associated with the another feedforward neural network based, at least in part, on the differences between the actual classifications and estimated classifications of the target words in each training sample.
- The systemof claim 57, the at least one processor further configured to: for each training sample,generate a first set of word embedding vectors, wherein each word embedding vector in the first set of word embedding vectors is generated based, at least in part, on a respective one of at least one word before the target word in the training sample; andgenerate a second set of word embedding vectors, wherein each word embedding vector in the second set of word embedding vectors is generated based, at least in part, on a respective one of at least one word after the target word in the training sample.
- The systemof claim 62, wherein the number of dimension of each word embedding vector is at least 100.
- The systemof claim 62, wherein:the at least one word before the target word comprises all words before the target word in the sentence; andthe at least one word after the target word comprises all words after the target word in the sentence.
- The systemof claim 62, the at least one processor further configured to: for each training sample,provide a first context vector of the target word based, at least in part, on the first set of word embedding vectors using a first one of the two recurrent neural networks;provide a second context vector of the target word based, at least in part, on the second set of word embedding vectors using a second one of the two recurrent neural networks; andprovide the context vector by concatenating the first and second context vectors.
- The systemof claim 65, wherein:the first set of word embedding vectors are provided to the first recurrent neural network starting from the word embedding vector of a word at the beginning of the sentence; andthe second set of word embedding vectors are provided to the second recurrent neural network starting from the word embedding vector of a word at the end of the sentence.
- The systemof claim 65, wherein the first and second context vectors do not comprise a semantic feature of the sentence in the training sample.
- The systemof claim 57, wherein the number of hidden units in each of the two recurrent neural networks is at least 300.
- The systemof claim 57, wherein the feedforward neural network comprises:a first layer having a first activation function of a fully connected linear operation on the context vector; anda second layer connected to the first layer and having a second activation function for generating the classification value.
- A tangible computer-readable device having instructions stored thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations comprising:providingan artificial neural network model for estimating a classification of a target word in a sentence with respect to a grammatical error type, wherein the model comprises (i) two recurrent neural networks configured to output a context vector of the target word based, at least in part, on at least one word before the target word and at least one word after the target word in the sentence, and (ii) a feedforward neural network configured to output a classification value of the target word based, at least in part, on the context vector of the target word;obtaininga set of training samples, wherein each training sample in the set of training samples comprises a sentence comprising a target word with respect to the grammatical error type and an actual classification of the target word with respect to the grammatical error type; andjointly adjustinga first set of parameters associated with the recurrent neural networks and a second set of parameters associated with the feedforward neural network based, at least in part, on differences between the actual classifications and estimated classifications of the target words in each training sample.
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020207005087A KR102490752B1 (en) | 2017-08-03 | 2017-08-03 | Deep context-based grammatical error correction using artificial neural networks |
CN201780094942.2A CN111226222B (en) | 2017-08-03 | 2017-08-03 | Depth context-based grammar error correction using artificial neural networks |
JP2020505241A JP7031101B2 (en) | 2017-08-03 | 2017-08-03 | Methods, systems and tangible computer readable devices |
MX2020001279A MX2020001279A (en) | 2017-08-03 | 2017-08-03 | Deep context-based grammatical error correction using artificial neural networks. |
PCT/CN2017/095841 WO2019024050A1 (en) | 2017-08-03 | 2017-08-03 | Deep context-based grammatical error correction using artificial neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/095841 WO2019024050A1 (en) | 2017-08-03 | 2017-08-03 | Deep context-based grammatical error correction using artificial neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019024050A1 true WO2019024050A1 (en) | 2019-02-07 |
Family
ID=65233230
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/095841 WO2019024050A1 (en) | 2017-08-03 | 2017-08-03 | Deep context-based grammatical error correction using artificial neural networks |
Country Status (5)
Country | Link |
---|---|
JP (1) | JP7031101B2 (en) |
KR (1) | KR102490752B1 (en) |
CN (1) | CN111226222B (en) |
MX (1) | MX2020001279A (en) |
WO (1) | WO2019024050A1 (en) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110210294A (en) * | 2019-04-23 | 2019-09-06 | 平安科技(深圳)有限公司 | Evaluation method, device, storage medium and the computer equipment of Optimized model |
CN110309512A (en) * | 2019-07-05 | 2019-10-08 | 北京邮电大学 | A Method of Correcting Chinese Grammar Errors Based on Generative Adversarial Networks |
CN110399607A (en) * | 2019-06-04 | 2019-11-01 | 深思考人工智能机器人科技(北京)有限公司 | A kind of conversational system text error correction system and method based on phonetic |
CN110472243A (en) * | 2019-08-08 | 2019-11-19 | 河南大学 | A method for checking Chinese spelling |
CN110797010A (en) * | 2019-10-31 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Question-answer scoring method, device, equipment and storage medium based on artificial intelligence |
CN110889284A (en) * | 2019-12-04 | 2020-03-17 | 成都中科云集信息技术有限公司 | Multi-task learning Chinese language disease diagnosis method based on bidirectional long-time and short-time memory network |
CN111310447A (en) * | 2020-03-18 | 2020-06-19 | 科大讯飞股份有限公司 | Grammar error correction method, grammar error correction device, electronic equipment and storage medium |
JP2020140183A (en) * | 2019-03-03 | 2020-09-03 | 学校法人甲南学園 | Language learning support device |
CN111914540A (en) * | 2019-05-10 | 2020-11-10 | 阿里巴巴集团控股有限公司 | Statement identification method and device, storage medium and processor |
CN111950292A (en) * | 2020-06-22 | 2020-11-17 | 北京百度网讯科技有限公司 | Text error correction model training method, text error correction processing method and device |
CN112016603A (en) * | 2020-08-18 | 2020-12-01 | 上海松鼠课堂人工智能科技有限公司 | Error analysis method based on graph neural network |
CN112380883A (en) * | 2020-12-04 | 2021-02-19 | 北京有竹居网络技术有限公司 | Model training method, machine translation method, device, equipment and storage medium |
CN112749553A (en) * | 2020-06-05 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Text information processing method and device for video file and server |
US20210271810A1 (en) * | 2020-03-02 | 2021-09-02 | Grammarly Inc. | Proficiency and native language-adapted grammatical error correction |
US11176321B2 (en) | 2019-05-02 | 2021-11-16 | International Business Machines Corporation | Automated feedback in online language exercises |
WO2021260554A1 (en) * | 2020-06-22 | 2021-12-30 | Crimson AI LLP | Domain-specific grammar correction system, server and method for academic text |
CN114818713A (en) * | 2022-05-11 | 2022-07-29 | 安徽理工大学 | Chinese named entity recognition method based on boundary detection |
CN114896966A (en) * | 2022-05-17 | 2022-08-12 | 西安交通大学 | Method, system, equipment and medium for positioning grammar error of Chinese text |
EP4080399A4 (en) * | 2019-12-18 | 2022-11-23 | Fujitsu Limited | INFORMATION PROCESSING PROGRAM, INFORMATION PROCESSING METHOD AND INFORMATION PROCESSING DEVICE |
CN115544259A (en) * | 2022-11-29 | 2022-12-30 | 城云科技(中国)有限公司 | Long text classification preprocessing model and construction method, device and application thereof |
CN116306598A (en) * | 2023-05-22 | 2023-06-23 | 上海蜜度信息技术有限公司 | Customized error correction methods, systems, equipment and media for words in different fields |
CN117350283A (en) * | 2023-10-11 | 2024-01-05 | 西安栗子互娱网络科技有限公司 | Text defect detection method, device, equipment and storage medium |
JP2024500778A (en) * | 2020-12-18 | 2024-01-10 | グーグル エルエルシー | On-device grammar checking |
CN117574860A (en) * | 2024-01-16 | 2024-02-20 | 北京蜜度信息技术有限公司 | Method and equipment for text color rendering |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102517971B1 (en) * | 2020-08-14 | 2023-04-05 | 부산대학교 산학협력단 | Context sensitive spelling error correction system or method using Autoregressive language model |
KR102379660B1 (en) * | 2020-11-30 | 2022-03-29 | 주식회사 티맥스에이아이 | Method for utilizing deep learning based semantic role analysis |
CN114580384A (en) * | 2020-12-02 | 2022-06-03 | 北大方正集团有限公司 | Method, apparatus, medium, and program for training and recognizing grammar error recognition model |
CN112597754B (en) * | 2020-12-23 | 2023-11-21 | 北京百度网讯科技有限公司 | Text error correction methods, devices, electronic equipment and readable storage media |
KR20220106331A (en) * | 2021-01-22 | 2022-07-29 | 삼성전자주식회사 | Electronic apparatus and method for controlling thereof |
JP2022164001A (en) * | 2021-04-15 | 2022-10-27 | 株式会社Nttドコモ | monolingual translator |
CN114372441B (en) * | 2022-03-23 | 2022-06-03 | 中电云数智科技有限公司 | Automatic error correction method and device for Chinese text |
CN118014083B (en) * | 2024-02-29 | 2024-09-17 | 云南联合视觉科技有限公司 | Clinical case analysis problem generation method based on multi-round prompt |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103365838A (en) * | 2013-07-24 | 2013-10-23 | 桂林电子科技大学 | Method for automatically correcting syntax errors in English composition based on multivariate features |
US20130325442A1 (en) * | 2010-09-24 | 2013-12-05 | National University Of Singapore | Methods and Systems for Automated Text Correction |
US20150309982A1 (en) * | 2012-12-13 | 2015-10-29 | Postech Academy-Industry Foundation | Grammatical error correcting system and grammatical error correcting method using the same |
CN106610930A (en) * | 2015-10-22 | 2017-05-03 | 科大讯飞股份有限公司 | Foreign language writing automatic error correction method and system |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6964023B2 (en) * | 2001-02-05 | 2005-11-08 | International Business Machines Corporation | System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input |
CN101739870B (en) * | 2009-12-03 | 2012-07-04 | 深圳先进技术研究院 | Interactive language learning system and method |
US8775341B1 (en) * | 2010-10-26 | 2014-07-08 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
US10339920B2 (en) * | 2014-03-04 | 2019-07-02 | Amazon Technologies, Inc. | Predicting pronunciation in speech recognition |
KR102199445B1 (en) * | 2014-07-30 | 2021-01-06 | 에스케이텔레콤 주식회사 | Method and apparatus for discriminative training acoustic model based on class, and speech recognition apparatus using the same |
US10115055B2 (en) * | 2015-05-26 | 2018-10-30 | Booking.Com B.V. | Systems methods circuits and associated computer executable code for deep learning based natural language understanding |
US9552547B2 (en) * | 2015-05-29 | 2017-01-24 | Sas Institute Inc. | Normalizing electronic communications using a neural-network normalizer and a neural-network flagger |
US9595002B2 (en) * | 2015-05-29 | 2017-03-14 | Sas Institute Inc. | Normalizing electronic communications using a vector having a repeating substring as input for a neural network |
US20180260860A1 (en) * | 2015-09-23 | 2018-09-13 | Giridhari Devanathan | A computer-implemented method and system for analyzing and evaluating user reviews |
CN105845134B (en) * | 2016-06-14 | 2020-02-07 | 科大讯飞股份有限公司 | Spoken language evaluation method and system for freely reading question types |
-
2017
- 2017-08-03 JP JP2020505241A patent/JP7031101B2/en active Active
- 2017-08-03 CN CN201780094942.2A patent/CN111226222B/en active Active
- 2017-08-03 MX MX2020001279A patent/MX2020001279A/en unknown
- 2017-08-03 KR KR1020207005087A patent/KR102490752B1/en active Active
- 2017-08-03 WO PCT/CN2017/095841 patent/WO2019024050A1/en active IP Right Grant
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130325442A1 (en) * | 2010-09-24 | 2013-12-05 | National University Of Singapore | Methods and Systems for Automated Text Correction |
US20150309982A1 (en) * | 2012-12-13 | 2015-10-29 | Postech Academy-Industry Foundation | Grammatical error correcting system and grammatical error correcting method using the same |
CN103365838A (en) * | 2013-07-24 | 2013-10-23 | 桂林电子科技大学 | Method for automatically correcting syntax errors in English composition based on multivariate features |
CN106610930A (en) * | 2015-10-22 | 2017-05-03 | 科大讯飞股份有限公司 | Foreign language writing automatic error correction method and system |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020140183A (en) * | 2019-03-03 | 2020-09-03 | 学校法人甲南学園 | Language learning support device |
CN110210294A (en) * | 2019-04-23 | 2019-09-06 | 平安科技(深圳)有限公司 | Evaluation method, device, storage medium and the computer equipment of Optimized model |
US11176321B2 (en) | 2019-05-02 | 2021-11-16 | International Business Machines Corporation | Automated feedback in online language exercises |
CN111914540A (en) * | 2019-05-10 | 2020-11-10 | 阿里巴巴集团控股有限公司 | Statement identification method and device, storage medium and processor |
CN110399607A (en) * | 2019-06-04 | 2019-11-01 | 深思考人工智能机器人科技(北京)有限公司 | A kind of conversational system text error correction system and method based on phonetic |
CN110399607B (en) * | 2019-06-04 | 2023-04-07 | 深思考人工智能机器人科技(北京)有限公司 | Pinyin-based dialog system text error correction system and method |
CN110309512A (en) * | 2019-07-05 | 2019-10-08 | 北京邮电大学 | A Method of Correcting Chinese Grammar Errors Based on Generative Adversarial Networks |
CN110472243A (en) * | 2019-08-08 | 2019-11-19 | 河南大学 | A method for checking Chinese spelling |
CN110797010A (en) * | 2019-10-31 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Question-answer scoring method, device, equipment and storage medium based on artificial intelligence |
CN110889284B (en) * | 2019-12-04 | 2023-04-07 | 成都中科云集信息技术有限公司 | Multi-task learning Chinese language sickness diagnosis method based on bidirectional long-time and short-time memory network |
CN110889284A (en) * | 2019-12-04 | 2020-03-17 | 成都中科云集信息技术有限公司 | Multi-task learning Chinese language disease diagnosis method based on bidirectional long-time and short-time memory network |
EP4080399A4 (en) * | 2019-12-18 | 2022-11-23 | Fujitsu Limited | INFORMATION PROCESSING PROGRAM, INFORMATION PROCESSING METHOD AND INFORMATION PROCESSING DEVICE |
EP4220474A1 (en) * | 2019-12-18 | 2023-08-02 | Fujitsu Limited | Information processing program, information processing method, and information processing device |
US12299389B2 (en) | 2020-03-02 | 2025-05-13 | Grammarly Inc. | Proficiency and native language-adapted grammatical error correction |
US20210271810A1 (en) * | 2020-03-02 | 2021-09-02 | Grammarly Inc. | Proficiency and native language-adapted grammatical error correction |
US11886812B2 (en) * | 2020-03-02 | 2024-01-30 | Grammarly, Inc. | Proficiency and native language-adapted grammatical error correction |
CN111310447B (en) * | 2020-03-18 | 2024-02-02 | 河北省讯飞人工智能研究院 | Grammar error correction method, grammar error correction device, electronic equipment and storage medium |
CN111310447A (en) * | 2020-03-18 | 2020-06-19 | 科大讯飞股份有限公司 | Grammar error correction method, grammar error correction device, electronic equipment and storage medium |
CN112749553A (en) * | 2020-06-05 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Text information processing method and device for video file and server |
CN112749553B (en) * | 2020-06-05 | 2023-07-25 | 腾讯科技(深圳)有限公司 | Text information processing method and device for video file and server |
CN111950292A (en) * | 2020-06-22 | 2020-11-17 | 北京百度网讯科技有限公司 | Text error correction model training method, text error correction processing method and device |
WO2021260554A1 (en) * | 2020-06-22 | 2021-12-30 | Crimson AI LLP | Domain-specific grammar correction system, server and method for academic text |
US11593557B2 (en) | 2020-06-22 | 2023-02-28 | Crimson AI LLP | Domain-specific grammar correction system, server and method for academic text |
CN111950292B (en) * | 2020-06-22 | 2023-06-27 | 北京百度网讯科技有限公司 | Text error correction model training method, text error correction processing method and device |
CN112016603A (en) * | 2020-08-18 | 2020-12-01 | 上海松鼠课堂人工智能科技有限公司 | Error analysis method based on graph neural network |
CN112380883B (en) * | 2020-12-04 | 2023-07-25 | 北京有竹居网络技术有限公司 | Model training method, machine translation method, device, equipment and storage medium |
CN112380883A (en) * | 2020-12-04 | 2021-02-19 | 北京有竹居网络技术有限公司 | Model training method, machine translation method, device, equipment and storage medium |
JP2024500778A (en) * | 2020-12-18 | 2024-01-10 | グーグル エルエルシー | On-device grammar checking |
CN114818713A (en) * | 2022-05-11 | 2022-07-29 | 安徽理工大学 | Chinese named entity recognition method based on boundary detection |
CN114896966A (en) * | 2022-05-17 | 2022-08-12 | 西安交通大学 | Method, system, equipment and medium for positioning grammar error of Chinese text |
CN115544259B (en) * | 2022-11-29 | 2023-02-17 | 城云科技(中国)有限公司 | Long text classification preprocessing model and construction method, device and application thereof |
CN115544259A (en) * | 2022-11-29 | 2022-12-30 | 城云科技(中国)有限公司 | Long text classification preprocessing model and construction method, device and application thereof |
CN116306598A (en) * | 2023-05-22 | 2023-06-23 | 上海蜜度信息技术有限公司 | Customized error correction methods, systems, equipment and media for words in different fields |
CN116306598B (en) * | 2023-05-22 | 2023-09-08 | 上海蜜度信息技术有限公司 | Customized error correction method, system, equipment and medium for words in different fields |
CN117350283A (en) * | 2023-10-11 | 2024-01-05 | 西安栗子互娱网络科技有限公司 | Text defect detection method, device, equipment and storage medium |
CN117574860A (en) * | 2024-01-16 | 2024-02-20 | 北京蜜度信息技术有限公司 | Method and equipment for text color rendering |
Also Published As
Publication number | Publication date |
---|---|
JP7031101B2 (en) | 2022-03-08 |
CN111226222B (en) | 2023-07-07 |
JP2020529666A (en) | 2020-10-08 |
MX2020001279A (en) | 2020-08-20 |
KR102490752B1 (en) | 2023-01-20 |
CN111226222A (en) | 2020-06-02 |
KR20200031154A (en) | 2020-03-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102490752B1 (en) | Deep context-based grammatical error correction using artificial neural networks | |
KR102329127B1 (en) | Apparatus and method for converting dialect into standard language | |
Yu et al. | Learning composition models for phrase embeddings | |
Yang et al. | Joint relational embeddings for knowledge-based question answering | |
US20140163951A1 (en) | Hybrid adaptation of named entity recognition | |
US20080215309A1 (en) | Extraction-Empowered machine translation | |
US11775763B2 (en) | Weakly supervised and explainable training of a machine-learning-based named-entity recognition (NER) mechanism | |
US20140365201A1 (en) | Training markov random field-based translation models using gradient ascent | |
US11941361B2 (en) | Automatically identifying multi-word expressions | |
Woodsend et al. | Text rewriting improves semantic role labeling | |
CN112668319A (en) | Vietnamese news event detection method based on Chinese information and Vietnamese statement method guidance | |
Hasan et al. | Neural clinical paraphrase generation with attention | |
US12248753B2 (en) | Bridging semantics between words and definitions via aligning word sense inventories | |
CN110991193B (en) | OpenKiwi-based translation matrix model selection system | |
Tedla et al. | Analyzing word embeddings and improving POS tagger of tigrinya | |
CN111144134B (en) | OpenKiwi-based automatic evaluation system for translation engine | |
Siddique et al. | Bilingual word embeddings for cross-lingual personality recognition using convolutional neural nets | |
Sardarov | Development and design of deep learning-based parts-of-speech tagging system for azerbaijani language | |
Escolano Peinado | Learning multilingual and multimodal representations with language-specific encoders and decoders for machine translation | |
Tkachenko et al. | Neural morphological tagging for Estonian | |
Barkovska et al. | AUTOMATIC TEXT TRANSLATION SYSTEM FOR ARTIFICIAL LLANGUAGES | |
Park et al. | Classification‐Based Approach for Hybridizing Statistical and Rule‐Based Machine Translation | |
Wegari et al. | Parts of speech tagging for afaan oromo | |
Antony et al. | Statistical method for English to Kannada transliteration | |
Nikiforova et al. | Language Models for Cloze Task Answer Generation in Russian |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17919750 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020505241 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20207005087 Country of ref document: KR Kind code of ref document: A |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 24/03/2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17919750 Country of ref document: EP Kind code of ref document: A1 |
|
WWG | Wipo information: grant in national office |
Ref document number: MX/A/2020/001279 Country of ref document: MX |