JP2022175661A

JP2022175661A - CALIBRATION SUPPORT DEVICE, CALIBRATION SUPPORT METHOD, AND PROGRAM

Info

Publication number: JP2022175661A
Application number: JP2021082276A
Authority: JP
Inventors: 尚徳金山; Hisanori Kanayama; 雄大平野; Takehiro Hirano
Original assignee: Toppan Printing Co Ltd
Current assignee: Toppan Inc
Priority date: 2021-05-14
Filing date: 2021-05-14
Publication date: 2022-11-25
Anticipated expiration: 2041-05-14
Also published as: JP7718097B2

Abstract

To provide a proofreading assisting device, a proofreading assisting method and a program capable of extracting a letter string that has a possibility of an orthographical variant without increasing a necessary time for proofreading even if a sentence is a long sentence.SOLUTION: A proofreading assisting device includes: an obtaining unit that obtains a target sentence as a target of proofreading; a dividing unit that generates divided sentences obtained by dividing the target sentence into at least three or more; a connecting unit that generates a connected sentence which is obtained by connecting different divided sentences from each other among the divided sentences and is shorter than the target sentence; and an evaluating unit that extracts a letter sting which is a candidate for an orthographical variant in the connected sentence.SELECTED DRAWING: Figure 2

Description

本発明は、校正支援装置、校正支援方法、及びプログラムに関する。 The present invention relates to a proofreading support device, a proofreading support method, and a program.

文章の表記ゆれを校正することが行われている。例えば、特許文献１には、校正の対象とする文章から表記ゆれの候補となる文字列を抽出し、抽出した候補を他の候補の表記と比較して表記ゆれの有無を判定する技術が開示されている。 Sentences are corrected for spelling errors. For example, Patent Literature 1 discloses a technique for extracting character strings that are candidates for spelling variations from a sentence to be proofread, comparing the extracted candidates with the spellings of other candidates, and determining the presence or absence of spelling variations. It is

特開平３－１８４１６２号公報JP-A-3-184162

特許文献1に記載の方法では、文章から抽出した文字列を、他の全ての候補の表記と比較する。例えば、文章から抽出した文字列の数がＫ（Ｋは任意の自然数）である場合、比較する回数は、Ｋ×（Ｋ－１）となる。したがって、校正の対象とする文章が１００ワード程度の短いものである場合には比較する回数は１万回程度で済むが、１００万ワード程度の長い文章の場合、比較回数が１億回程度となってしまい回数が加速度的に増大してしまう。このため、長文の文章の表記ゆれをチェックするのに何日もかかってしまう場合があり現実的でないという問題があった。 The method described in Patent Document 1 compares a character string extracted from a sentence with all other candidate notations. For example, if the number of character strings extracted from a sentence is K (K is any natural number), the number of comparisons is K×(K−1). Therefore, if the sentence to be proofread is a short sentence of about 100 words, the number of times of comparison is about 10,000. As a result, the number of times increases at an accelerating rate. For this reason, there is a problem that it may take several days to check for spelling variations in long sentences, which is not practical.

本発明は、このような状況に鑑みてなされたものであり、長文の文章であっても校正に要する時間を増大させることなく、表記ゆれの可能性がある文字列を抽出することができる校正支援装置、校正支援方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and is capable of extracting character strings that may have spelling variations without increasing the time required for proofreading even long sentences. An object is to provide a support device, a proofreading support method, and a program.

本発明の、校正支援装置は、校正の対象とする対象文章を取得する取得部と、前記対象文章を、少なくとも三つ以上に分割してなる分割文章を生成する分割部と、前記分割文章のうち、互いに異なる分割文章を連結してなる連結文章であって、対象文章より短い連結文章を生成する連結部と、前記連結文章における表記ゆれの候補となる文字列を抽出する評価部と、を備える。 The proofreading support apparatus of the present invention includes an acquisition unit that acquires a target sentence to be proofread, a division unit that generates split sentences by dividing the target sentence into at least three or more, and the split sentences. Among them, a connection unit that generates a connected sentence that is a connected sentence formed by connecting different divided sentences and is shorter than the target sentence, and an evaluation unit that extracts a character string that is a candidate for spelling variation in the connected sentence. Prepare.

本発明の、校正支援方法は、コンピュータが行う校正支援方法であって、取得部が、校正の対象とする対象文章を取得し、分割部が、前記対象文章を、少なくとも三つ以上に分割してなる分割文章を生成し、連結部が、前記分割文章のうち、互いに異なる分割文章を連結してなる連結文章であって、対象文章より短い連結文章を生成し、評価部が、前記連結文章における表記ゆれの候補となる文字列を抽出する。 A proofreading support method of the present invention is a proofreading support method performed by a computer, wherein an obtaining unit obtains a target sentence to be proofread, and a dividing unit divides the target sentence into at least three or more. A connecting unit generates a connected sentence formed by connecting different divided sentences among the divided sentences and is shorter than the target sentence, and an evaluating unit generates the connected sentence Extract character strings that are candidates for spelling variations in .

本発明の、プログラムは、コンピュータを上記に記載の校正支援装置として動作させるためのプログラムであって、前記コンピュータを前記校正支援装置が備える各部として機能させるためのプログラムである。 A program according to the present invention is a program for causing a computer to operate as the proofreading support apparatus described above, and for causing the computer to function as each unit provided in the proofreading support apparatus.

本発明によれば、表記ゆれを見逃してしまうリスクが低く、長文の文章であっても校正に要する時間を増大させないようにすることができる。 According to the present invention, there is a low risk of overlooking spelling variations, and it is possible to prevent an increase in the time required for proofreading even for long sentences.

実施形態による校正支援装置１０が行う処理を説明する図である。4A and 4B are diagrams for explaining processing performed by the proofreading support apparatus 10 according to the embodiment; FIG. 実施形態による校正支援装置１０の構成例を示すブロック図である。1 is a block diagram showing a configuration example of a proofreading support device 10 according to an embodiment; FIG. 実施形態によるリスト情報１２０の例を示す図である。FIG. 12 illustrates an example of list information 120 according to an embodiment; 実施形態によるリスト情報１２０の例を示す図である。FIG. 12 illustrates an example of list information 120 according to an embodiment; 実施形態によるリスト情報１２０の例を示す図である。FIG. 12 illustrates an example of list information 120 according to an embodiment; 実施形態によるリスト情報１２０の例を示す図である。FIG. 12 illustrates an example of list information 120 according to an embodiment; 実施形態によるリスト情報１２０の例を示す図である。FIG. 12 illustrates an example of list information 120 according to an embodiment; 実施形態によるリスト情報１２０の例を示す図である。FIG. 12 illustrates an example of list information 120 according to an embodiment; 実施形態による校正支援装置１０が行う処理の流れを示すフローチャートである。4 is a flow chart showing the flow of processing performed by the proofreading support apparatus 10 according to the embodiment; 表記ゆれ候補を抽出する処理を説明する図である。FIG. 10 is a diagram illustrating processing for extracting spelling variation candidates; 表記ゆれ候補を抽出する処理を説明する図である。FIG. 10 is a diagram illustrating processing for extracting spelling variation candidates; 表記ゆれ候補を抽出する処理を説明する図である。FIG. 10 is a diagram illustrating processing for extracting spelling variation candidates; 表記ゆれ候補を抽出する処理を説明する図である。FIG. 10 is a diagram illustrating processing for extracting spelling variation candidates;

以下、本発明の実施形態について、図面を参照して説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings.

まず、文章から表記ゆれの候補（表記ゆれ候補）を抽出する方法について説明する。図１０Ａ～図１０Ｄは、表記ゆれ候補を抽出する処理を説明する図である。 First, a method of extracting candidates for spelling variations (candidates for spelling variations) from a sentence will be described. 10A to 10D are diagrams for explaining the process of extracting spelling variation candidates.

図１０Ａには、文章の例が示されている。図１０Ａの例に示すように、「取り扱いがあります。取り扱いは有ります。取扱いがあります。取扱はないですね。」と記載された文章から、表記ゆれ候補を抽出する場合を例示して説明する。 An example of a sentence is shown in FIG. 10A. As shown in the example of FIG. 10A, a case of extracting spelling variation candidates from a sentence stating "There is handling. There is handling. There is handling. There is no handling." .

図１０Ｂには、文章を単語（形態素）ごとに区切られた文章の例が示されている。図１０Ｂの例に示すように、図１０Ａの文章を単語ごとに区切ると、例えば、「取り扱い／が／あり／ます／。／取り扱い／は／有り／ます／。／取扱い／が／あり／ます／。／取扱／は／ない／です／ね／。」のようになる。ここでの「／」は、区切りを示す記号である。 FIG. 10B shows an example of sentences divided by words (morphemes). As shown in the example of FIG. 10B, if the sentence in FIG. /./handling/was/no/desu/ne/.". "/" here is a symbol indicating a delimiter.

図１０Ｃ、図１０Ｄを用いて、表記ゆれ候補を抽出する処理を説明する。図１０Ｃには、表記ゆれ候補を抽出する処理が示されている。図１０Ｄには、図１０Ｃの評価に用いられるリストの例が示されている。 Processing for extracting spelling variation candidates will be described with reference to FIGS. 10C and 10D. FIG. 10C shows processing for extracting spelling variation candidates. FIG. 10D shows an example of the list used for the evaluation of FIG. 10C.

図１０Ｃの表には、単語、リスト登録単語数、評価結果、リスト登録の有無などの項目が示されている。単語は、文章に含まれる単語が順に示されている。リスト登録単語数には、リストに登録された単語の数が示されている。リストは、図１０Ｄの例に示すような、文章に含まれる単語を評価する際に用いられるリストである。ここでの評価とは、文章に含まれる単語が、リストに記載された単語と一致する、或いは類似するか否かを判定することである。 The table in FIG. 10C shows items such as words, the number of registered words in the list, evaluation results, and presence/absence of registration in the list. As for the words, the words included in the sentence are shown in order. The number of words registered in the list indicates the number of words registered in the list. The list is a list used when evaluating words contained in sentences, as shown in the example of FIG. 10D. Evaluation here means determining whether or not the words included in the sentence match or are similar to the words described in the list.

評価結果は、文章に含まれる単語を評価した結果が示されている。ここでの評価結果には、文章に含まれる単語と、リストに登録された単語とが、一致する（又は類似する）か否かを判定した結果が示される。 The evaluation result indicates the result of evaluating the words contained in the sentence. The evaluation result here indicates the result of determining whether or not the words included in the sentence match (or are similar to) the words registered in the list.

リスト登録の有無は、文章に含まれる単語を、リストに単語を登録したか否かが示されている。ここでは、評価結果に基づいて、文章に含まれる単語と、リストに登録された単語とが、一致する（又は類似する）しないと判定された場合に、文章に含まれる単語が、リストに登録される。 Whether or not the word is registered in the list indicates whether or not the word included in the sentence is registered in the list. Here, when it is determined that the word included in the sentence and the word registered in the list do not match (or are similar) based on the evaluation result, the word included in the sentence is added to the list. be done.

図１０Ｄの表には、リストＮｏ、単語、要素１、要素２…などの項目が示されている。リストＮｏは、リストに登録された単語を一意に特定する番号などの識別情報である。単語は、リストＮｏにて特定される単語を示す。要素は、リストＮｏにて特定される単語に類似すると判定された単語を示す。 The table in FIG. 10D shows items such as list number, word, element 1, element 2, and so on. The list No. is identification information such as a number that uniquely identifies the words registered in the list. A word indicates a word specified by the list No. The element indicates words determined to be similar to the word specified in the list No.

まず、リストに単語が登録されていない状態で評価が開始される。すなわち、図１０Ｃの表の一番上に示された、「取り扱い」との単語が、リストに登録された単語と一致又は類似するか否かが判定される。この時点においてリストに単語が登録されていない。このため、「取り扱い」との単語については、「一致類似なし」との評価結果となる。「一致類似なし」との評価結果が得られた単語がリストに登録される。図１０Ｃの例では、リストのＮｏ．１に、「取り扱い」との単語が登録されたことが示されている。そして、図１０ＤのリストのＮｏ．１に、「取り扱い」との単語が登録されている例が示されている。 First, evaluation is started with no words registered in the list. That is, it is determined whether or not the word "handling" shown at the top of the table in FIG. 10C matches or is similar to the words registered in the list. No words are registered in the list at this point. Therefore, the word "handling" is evaluated as "no match/similarity". Words for which an evaluation result of "no match/similarity" is obtained are registered in the list. In the example of FIG. 10C, list No. 1 indicates that the word "handling" has been registered. Then, No. in the list of FIG. 10D. 1 shows an example in which the word "handling" is registered.

次に、「が」との単語が、リストに登録された単語と一致又は類似するか否かが判定される。リストには「取り扱い」との単語のみが登録された状態であるため、「が」との単語については、「一致類似なし」との評価結果となる。「一致類似なし」との評価結果が得られた単語がリストに登録される。図１０Ｃの例では、リストのＮｏ．２に、「が」との単語が登録されたことが示されている。そして、図１０ＤのリストのＮｏ．２に、「が」との単語が登録されている例が示されている。 Next, it is determined whether the word "ga" matches or is similar to the words registered in the list. Since only the word "handling" is registered in the list, the word "ga" is evaluated as "no match/similarity". Words for which an evaluation result of "no match/similarity" is obtained are registered in the list. In the example of FIG. 10C, list No. 2 shows that the word "ga" has been registered. Then, No. in the list of FIG. 10D. 2 shows an example in which the word "ga" is registered.

次に、「あり」との単語が、リストに登録された単語と一致又は類似するか否かが判定される。リストには「取り扱い」及び「が」との単語のみが登録された状態である。このため、「あり」との単語については、「一致類似なし」との評価結果となる。「一致類似なし」との評価結果が得られた単語がリストに登録される。図１０Ｃの例では、リストのＮｏ．３に、「あり」との単語が登録されたことが示されている。そして、図１０ＤのリストのＮｏ．３に、「あり」との単語が登録されている例が示されている。 Next, it is determined whether the word "with" matches or is similar to the words registered in the list. Only the words "handle" and "ga" are registered in the list. For this reason, the word "yes" is evaluated as "no match/similarity". Words for which an evaluation result of "no match/similarity" is obtained are registered in the list. In the example of FIG. 10C, list No. 3 shows that the word "yes" has been registered. Then, No. in the list of FIG. 10D. 3 shows an example in which the word "with" is registered.

次に、「ます」との単語が、リストに登録された単語と一致又は類似するか否かが判定される。リストには「取り扱い」、「が」及び「あり」との単語のみが登録された状態である。このため、「ます」との単語については、「一致類似なし」との評価結果となる。「一致類似なし」との評価結果が得られた単語がリストに登録される。図１０Ｃの例では、リストのＮｏ．４に、「ます」との単語が登録されたことが示されている。そして、図１０ＤのリストのＮｏ．４に、「ます」との単語が登録されている例が示されている。 Next, it is determined whether the word "masu" matches or is similar to the words registered in the list. Only the words "handling", "ga" and "with" are registered in the list. For this reason, the word “masu” is evaluated as “no match/similarity”. Words for which an evaluation result of "no match/similarity" is obtained are registered in the list. In the example of FIG. 10C, list No. 4 shows that the word "masu" has been registered. Then, No. in the list of FIG. 10D. 4 shows an example in which the word "masu" is registered.

次に、「。」との単語が、リストに登録された単語と一致又は類似するか否かが判定される。リストには「取り扱い」、「が」、「あり」及び「ます」との単語のみが登録された状態である。このため、「。」との単語については、「一致類似なし」との評価結果となる。「一致類似なし」との評価結果が得られた単語がリストに登録される。図１０Ｃの例では、リストのＮｏ．５に、「。」との単語が登録されたことが示されている。そして、図１０ＤのリストのＮｏ．５に、「。」との単語が登録されている例が示されている。 Next, it is determined whether the word "." matches or is similar to the words registered in the list. Only the words "handle", "ga", "ari" and "masu" are registered in the list. For this reason, the word "." is evaluated as "no match and similarity". Words for which an evaluation result of "no match/similarity" is obtained are registered in the list. In the example of FIG. 10C, list No. 5 indicates that the word "." has been registered. Then, No. in the list of FIG. 10D. 5 shows an example in which the word "." is registered.

次に、「取り扱い」との単語が、リストに登録された単語と一致又は類似するか否かが判定される。リストには「取り扱い」、「が」、「あり」、「ます」及び「。」との単語のみが登録された状態である。このため、「取り扱い」との単語については、「Ｎｏ．１と一致」との評価結果となる。「一致あり」との評価結果が得られた単語は、リストに新規に登録されない。 Next, it is determined whether the word "handling" matches or is similar to the words registered in the list. Only the words "handle", "ga", "ari", "masu" and "." are registered in the list. Therefore, the word "handling" is evaluated as "matching No. 1". Words that are evaluated as "matched" are not newly registered in the list.

次に、「は」との単語が、リストに登録された単語と一致又は類似するか否かが判定される。リストには「取り扱い」、「が」、「あり」、「ます」及び「。」との単語のみが登録された状態である。このため、「は」との単語については、「一致類似なし」との評価結果となる。「一致類似なし」との評価結果が得られた単語がリストに登録される。図１０Ｃの例では、リストのＮｏ．６に、「は」との単語が登録されたことが示されている。そして、図１０ＤのリストのＮｏ．６に、「は」との単語が登録されている例が示されている。 Next, it is determined whether the word "wa" matches or is similar to the words registered in the list. Only the words "handle", "ga", "ari", "masu" and "." are registered in the list. For this reason, the word "ha" is evaluated as "no match/similarity". Words for which an evaluation result of "no match/similarity" is obtained are registered in the list. In the example of FIG. 10C, list No. 6 shows that the word "ha" has been registered. Then, No. in the list of FIG. 10D. 6 shows an example in which the word "wa" is registered.

次に、「有り」との単語が、リストに登録された単語と一致又は類似するか否かが判定される。リストには「取り扱い」、「が」、「あり」、「ます」、「。」及び「は」との単語が登録された状態である。このため、「有り」との単語については、リストＮｏ３の「あり」と「類似する」との評価結果となる。「類似する」との評価結果が得られた単語はリストに新規登録されない。「類似する」との評価結果が得られた単語は、類似すると判定された単語の要素に追加される。ここでは、図１０ＤのリストのＮｏ．３の要素１に、「有り」との単語が追加された例が示されている。 Next, it is determined whether the word "yes" matches or is similar to the words registered in the list. In the list, words such as "handle", "ga", "ari", "masu", "." and "ha" are registered. Therefore, the word "yes" is evaluated as "similar" to "yes" in list No. 3. Words evaluated as "similar" are not newly registered in the list. A word evaluated as “similar” is added to the elements of words determined to be similar. Here, No. of the list of FIG. 10D. An example in which the word "yes" is added to element 1 of 3 is shown.

なお、単語同士が類似するか否かは、任意に決定されてよい。例えば、校正支援装置１０により表記ゆれの候補が抽出され、ユーザにより統一した表記となるように校正するか否かが判断されるように運用される場合を考える。この場合、文章において、表記ゆれとして校正の対象となる単語同士が「類似する」と判定されることが好適である。 It should be noted that whether words are similar to each other may be determined arbitrarily. For example, consider a case in which the proofreading support apparatus 10 extracts candidates for spelling variations, and the user determines whether or not to proofread them so that the spellings are unified. In this case, it is preferable that words to be corrected for spelling variation be determined to be “similar” to each other in the sentence.

例えば、図１０Ａの文章では、「取り扱い」、「取扱い」、「取扱」が類似する単語と判定されることが好ましい。文章中に、「取り扱い」、「取扱い」、「取扱」の記載が混在している場合、校正担当者がこれらの記載を表記ゆれと判断して、いずれか一方の記載に統一するように校正することが通常考えられるためである。また、「あり」と「有り」が類似する単語と判定されることが好ましい。文章中に、「あり」と「有り」の記載が混在している場合、校正担当者がこれらの記載を表記ゆれと判断して、いずれか一方の記載に統一するように校正することが通常考えられるためである。 For example, in the sentence of FIG. 10A, it is preferable that "handling", "handling", and "handling" are determined to be similar words. If the descriptions of "handling", "handling", and "handling" are mixed in the text, the proofreader will judge these descriptions as notational variations and proofread so that they are unified to one of the descriptions. This is because it is usually considered that In addition, it is preferable that the words "with" and "with" are determined to be similar words. If the text contains both "Yes" and "Yes" statements, the proofreader usually judges these statements as notational variations and corrects them so that they are consistent with one of the statements. This is because it is conceivable.

なお、図１０Ｃの例では、「あり」と「ない」とが類似しない単語と判定される場合を例示している。しかしながらこれに限定されない。「あり」と「ない」と類似する単語と判定されてもよい。例えば、文章中に「あり」と「ない」とが混在している場合において、校正担当者が「あり」と「ない」の記載が表記ゆれと判断して、いずれか一方の記載に統一するように校正する可能性がある文章である場合においては、校正支援装置１０が、「あり」と「ない」とを類似する単語と判定するようにプログラムされる。 Note that the example of FIG. 10C illustrates a case in which ``with'' and ``not'' are determined to be dissimilar words. However, it is not limited to this. It may be determined as a word similar to "with" and "not". For example, if there is a mixture of "yes" and "no" in the text, the proofreader will determine that the descriptions of "yes" and "no" are notational variations, and unify them with one of them. , the proofreading support apparatus 10 is programmed to determine that "yes" and "no" are similar words.

図１０Ｃの例に示すように、表記ゆれの抽出においては、文章中の単語の評価が順に行われる。そして、評価結果に基づいて、リストに単語が登録されていく。文章中の単語を評価する時点においてリストに登録されている全ての単語と比較する処理を行う必要がある。このため、リストに登録する単語の数に応じて、評価に要する時間が増大する。例えば、図１０Ｃにおける「リスト登録単語数」の項目に示すように、図１０Ａの例に示す文章において、最終的に、リストに登録された単語の数は「９」個となる。 As shown in the example of FIG. 10C, in extracting spelling variations, words in a sentence are evaluated in order. Then, based on the evaluation results, words are registered in the list. When evaluating a word in a sentence, it is necessary to perform a process of comparing with all the words registered in the list. Therefore, the time required for evaluation increases according to the number of words registered in the list. For example, as shown in the item "number of words registered in list" in FIG. 10C, the number of words registered in the list is finally "9" in the sentence shown in the example of FIG. 10A.

文章が短く抽出される単語の数が少ないものであれば、評価にさほど時間を要しない。しかし、文章が長くから抽出される単語の数が多い場合には評価に膨大な時間を要するため、現実的ではなくなる。 If the sentence is short and the number of extracted words is small, the evaluation does not take much time. However, if the sentence is long and the number of words to be extracted is large, the evaluation will take an enormous amount of time, which is not realistic.

この対策として、本実施形態では、文章を分割する。分割することで文章が短くなり、評価に要する時間を短縮することが可能となる。しかし、分割した文章を評価したのみでは、文章全体における表記ゆれの整合が取れない。例えば、文章を２つに分割した場合において、前半の文章には「取り扱い」との単語が表記ゆれすることなく記載され、後半の文章に「取扱い」との単語が表記ゆれすることなく記載されているような場合があり得る。この場合、前後に分割した文章の中では表記ゆれが生じていない。しかしながら、文章全体においては、「取り扱い」と「取扱い」の表記ゆれが生じている。分割した文章を評価したのみでは、表記ゆれを見逃してしまうリスクがある。 As a countermeasure against this, in the present embodiment, the sentence is divided. By dividing the text, the text becomes shorter, and the time required for evaluation can be shortened. However, by only evaluating the divided sentences, it is not possible to match the notation variations in the entire sentence. For example, when a sentence is divided into two, the first half of the sentence contains the word “handling” without any spelling variations, and the second half of the sentence contains the word “handling” without any spelling variations. There may be cases where In this case, there is no notation variation in the sentences divided into the front and back. However, in the entire text, there are variations in the notation of "handling" and "handling". There is a risk of overlooking spelling variations only by evaluating the divided sentences.

この対策として、本実施形態では、分割した文章を連結させ、連結した文章を評価する。これにより、連結した文章における表記ゆれが整合するようにする。また、連結した文章の組合せについて、総当たりの組合せとすることにより、文章全体における表記ゆれが整合するようにする。 As a countermeasure against this, in the present embodiment, divided sentences are connected and the connected sentences are evaluated. This ensures that spelling variations in the concatenated sentences are consistent. In addition, by making a round-robin combination for the combination of the connected sentences, the notation variations in the entire sentences are matched.

図１は、実施形態による校正支援装置１０が行う処理を説明する図である。図１の上部には、対象文章ＡＢＣＤが示されている。対象文章ＡＢＣＤは、校正の対象となる文章である。この図の例では、「取り扱いがあります。取り扱いは有ります。取扱いがあります。取扱はないですね。」と記載された文章を対象文章ＡＢＣＤとする。 FIG. 1 is a diagram illustrating processing performed by a proofreading support apparatus 10 according to an embodiment. The upper part of FIG. 1 shows target sentences ABCD. A target sentence ABCD is a sentence to be proofread. In the example shown in this figure, the target sentence ABCD is a sentence that reads, "There is handling. There is handling. There is handling. There is no handling."

まず、校正支援装置１０は、対象文章を分割した分割文章を生成する。この図の例では、対象文章ＡＢＣＤを４つの分割文章Ａ～Ｄに分割した例が示されている。 First, the proofreading support apparatus 10 generates divided sentences by dividing the target sentence. In the example of this figure, an example in which the target sentence ABCD is divided into four divided sentences A to D is shown.

次に、校正支援装置１０は、分割文章を連結した連結文章を生成する。この図の例では、分割文章Ａ～Ｄのうち、異なる２つの分割文章を連結させた連結文章ＡＢ、ＡＣ、ＡＤ、ＢＣ、ＢＤ、ＣＤのそれぞれが生成された例が示されている。 Next, the proofreading support device 10 generates a connected sentence by connecting the divided sentences. This figure shows an example in which connected sentences AB, AC, AD, BC, BD, and CD are generated by connecting two different divided sentences among divided sentences A to D. FIG.

そして、校正支援装置１０は、連結文章ＡＢ、ＡＣ、ＡＤ、ＢＣ、ＢＤ、ＣＤのそれぞれについて評価を行い、それぞれのリスト（後述するリスト情報１２０）を生成する。校正支援装置１０は、それぞれのリストに重複して登録されている単語がある場合、その単語とその要素群を併合させる。併合後のリストは、文章全体を評価した場合に生成されるリストと一致する。 Then, the proofreading support apparatus 10 evaluates each of the concatenated sentences AB, AC, AD, BC, BD, and CD, and generates respective lists (list information 120 described later). If there is a word registered redundantly in each list, the proofreading support apparatus 10 merges the word with its element group. The merged list matches the list that would be produced if the entire sentence were evaluated.

ここで、校正支援装置１０は、連結文章ＡＢ、ＡＣ、ＡＤ、ＢＣ、ＢＤ、ＣＤのそれぞれの評価を、並列に処理するようにしてもよい。例えば、校正支援装置１０は、それぞれの連結文章の評価を並列に処理する。この場合、分割数に応じた数の評価が、並列に処理される。分割数を増やすことで、評価に要する時間を短縮させることが可能である。一方、文章全体を評価する場合と比較して、総計算数が増えることが考えられる。しかし、最近のコンピュータの性能向上により、総計算数が増えることによる処理時間へ影響はさほど大きいものではない。すなわち、総計算量が増加するとしても、並列に処理させることによる処理時間の短縮の効果が大きい。このため、長文の文章であっても校正に要する時間を増大させることがない。 Here, the proofreading support apparatus 10 may process the evaluation of each of the concatenated sentences AB, AC, AD, BC, BD, and CD in parallel. For example, the proofreading support device 10 processes the evaluation of each connected sentence in parallel. In this case, a number of evaluations corresponding to the number of divisions are processed in parallel. By increasing the number of divisions, it is possible to shorten the time required for evaluation. On the other hand, it is conceivable that the total number of calculations increases compared to the case of evaluating the entire sentence. However, due to recent improvements in the performance of computers, the increase in the total number of calculations does not significantly affect the processing time. That is, even if the total amount of calculation increases, the effect of shortening the processing time by performing parallel processing is great. Therefore, even a long sentence does not increase the time required for proofreading.

なお、校正支援装置１０が、対象文章を幾つの分割文章に分割するかは任意に決定されてよい。連結させることを考慮すれば、対象文章を二つに分割し、分割した二つの文章を連結させる場合は、意味をなさない。このため、校正支援装置１０は、対象文章を少なくとも三つ以上に分割すればよい。 It should be noted that the number of divided sentences into which the proofreading support apparatus 10 divides the target sentence may be arbitrarily determined. Considering concatenation, it does not make sense to divide the target sentence into two and concatenate the two divided sentences. Therefore, the proofreading support apparatus 10 may divide the target sentence into at least three parts.

図２は、実施形態による校正支援装置１０の構成例を示すブロック図である。校正支援装置１０は、校正の対象とする文章（対象文章）における表記ゆれの候補となる文字列を抽出するコンピュータ装置である。校正支援装置１０として、例えば、サーバ装置、クラウド、ＰＣ（Personal Computer）などを適用することが可能である。 FIG. 2 is a block diagram showing a configuration example of the proofreading support device 10 according to the embodiment. The proofreading support device 10 is a computer device that extracts character strings that are candidates for spelling variation in a sentence to be proofread (target sentence). As the proofreading support device 10, for example, a server device, a cloud, a PC (Personal Computer), etc. can be applied.

校正支援装置１０は、例えば、通信部１１と、記憶部１２と、制御部１３とを備える。通信部１１は、通信ネットワークなどを介して、外部の装置と通信する。通信部１１は、例えば、外部に設けられたサーバ装置などから、対象文章を示すテキスト情報などを受信する。また、通信部１１は、外部に設けられたサーバ装置などに、対象文章から抽出した表記ゆれの候補となる文字列を、校正結果として送信する。 The proofreading support device 10 includes, for example, a communication unit 11, a storage unit 12, and a control unit 13. The communication unit 11 communicates with an external device via a communication network or the like. The communication unit 11 receives, for example, text information indicating a target sentence from an external server device or the like. In addition, the communication unit 11 transmits, as proofreading results, character strings that are candidates for spelling variations extracted from the target text to a server device or the like provided outside.

記憶部１２は、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）、ＲＡＭ（Random Access read/write Memory）、ＲＯＭ（Read Only Memory）などの記憶媒体、あるいはこれらの組合せによって構成される。記憶部１２は、校正支援装置１０の各種処理を実行するためのプログラム、及び各種処理を行う際に利用される一時的なデータを記憶する。記憶部１２は、例えば、リスト情報１２０を記憶する。リスト情報１２０は、対象文章から抽出した表記ゆれの候補に関する情報である。 The storage unit 12 is a storage medium such as a HDD (Hard Disk Drive), flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), RAM (Random Access read/write Memory), ROM (Read Only Memory), or a combination thereof. Consists of The storage unit 12 stores programs for executing various processes of the proofreading support apparatus 10 and temporary data used when performing various processes. The storage unit 12 stores list information 120, for example. The list information 120 is information about candidates for spelling variation extracted from the target sentence.

制御部１３は、校正支援装置１０がハードウェアとして備えるＣＰＵ（Central Processing Unit）にプログラムを実行させることによって実現される。制御部１３は、校正支援装置１０を統括的に制御する。制御部１３は、例えば、取得部１３０と、分割部１３１と、連結部１３２と、評価部１３３と、装置制御部１３４とを備える。 The control unit 13 is realized by causing a CPU (Central Processing Unit) provided as hardware in the proofreading support apparatus 10 to execute a program. The control unit 13 comprehensively controls the proofreading support device 10 . The control unit 13 includes, for example, an acquisition unit 130, a division unit 131, a connection unit 132, an evaluation unit 133, and a device control unit .

取得部１３０は、対象文章を取得する。例えば、取得部１３０は、外部のサーバ装置等から通信部１１を介して対象文章を示す情報を取得する。取得部１３０は、取得した対象文章を示す情報を、分割部１３１に出力する。 Acquisition unit 130 acquires a target sentence. For example, the acquisition unit 130 acquires information indicating the target sentence from an external server device or the like via the communication unit 11 . The acquiring unit 130 outputs information indicating the acquired target sentence to the dividing unit 131 .

分割部１３１は、対象文章に基づいて、分割文章を生成する。例えば、分割部１３１は、分割文章のそれぞれに含まれる単語の数が同程度となるように、分割文章を生成する。この場合、分割部１３１は、対象文章を単語ごとに区切り、対象文章に記載された単語の数を算出する。分割部１３１は、算出した単語の数を、分割数で除算して得られた単語数に相当する記載の前後の文字列を検索対象として、対象文章を分割する境界を探索する。 The dividing unit 131 generates divided sentences based on the target sentence. For example, the dividing unit 131 generates divided sentences such that the number of words included in each divided sentence is approximately the same. In this case, the dividing unit 131 divides the target sentence into words and calculates the number of words described in the target sentence. The dividing unit 131 searches for a boundary at which the target sentence is divided using character strings before and after the description corresponding to the number of words obtained by dividing the calculated number of words by the number of divisions.

分割部１３１は、例えば、句点や、記号、読点などを、対象文章を分割する境界とする。記号は、例えば、感嘆符「！」や、疑問符「？」、音符「♪」、コロン「：」などである。句点や、記号、読点などを境界とするのは、これらが表記ゆれの対象となる可能性が低いためである。分割部１３１は、少なくとも、単語の途中に境界を設けないようにする。例えば、「取り扱い」との単語における「取り扱」と「い」との間に境界を設定してしまうと、前の分割文章において、元々「取り扱い」との単語であったものが、「取り扱」との単語として評価されてしまい、誤った評価をしてしまう可能性があるためである。 The dividing unit 131 uses, for example, periods, symbols, and commas as boundaries for dividing the target sentence. Symbols are, for example, an exclamation mark "!", a question mark "?", a musical note "♪", a colon ":", and the like. The reason why punctuation marks, symbols, and commas are used as boundaries is that they are less likely to be subject to notational variations. The dividing unit 131 at least does not set a boundary in the middle of a word. For example, if a boundary is set between ``handle'' and ``i'' in the word ``handle'', the word ``handle'' that was originally ``handle'' becomes ``handle'' This is because there is a possibility that it will be evaluated as a word "handled" and an erroneous evaluation will be made.

分割部１３１は、対象文章を分割する境界を決定し、決定した境界で分割することによって複数の分割文章を生成する。分割部１３１は、生成した分割文章を示す情報を、連結部１３２に出力する。 The dividing unit 131 determines boundaries for dividing the target sentence, and generates a plurality of divided sentences by dividing at the determined boundaries. The dividing unit 131 outputs information indicating the generated divided sentences to the connecting unit 132 .

連結部１３２は、連結文章を生成する。連結部１３２は、例えば、互いに異なる二つの分割文章の組合せとして考えられる全パターンを、連結文章として生成する。連結部１３２は、生成した連結文章を示す情報を、評価部１３３に出力する。 The linking unit 132 generates linked sentences. The connecting unit 132 generates, as connected sentences, all patterns that can be considered as a combination of two divided sentences that are different from each other, for example. The linking unit 132 outputs information indicating the generated linked sentences to the evaluation unit 133 .

評価部１３３は、連結文章のそれぞれを評価する。評価部１３３は、例えば、連結文章を単語ごとに区切り、区切った単語を順に、リスト（リスト情報１２０）に登録された単語と一致するか否か、及び類似するか否かを判定する。また、評価部１３３は、単語が、リストに登録された単語と一致も類似もしない場合、その単語をリストに登録する。また、評価部１３３は、単語が、リストに登録された単語と類似する場合、その単語を、リストに登録された単語（類似する単語）の要素として追加する。 The evaluation unit 133 evaluates each of the connected sentences. For example, the evaluation unit 133 divides the connected sentence into words, and sequentially determines whether the divided words match or are similar to the words registered in the list (list information 120). If the word neither matches nor resembles a word registered in the list, the evaluation unit 133 registers the word in the list. Moreover, when a word is similar to a word registered in the list, the evaluation unit 133 adds the word as an element of the word registered in the list (similar word).

装置制御部１３４は、校正支援装置１０を統括的に制御する。例えば、装置制御部１３４は、通信部１１が受信した対象文章を示すテキスト情報を、取得部１３０に出力する。装置制御部１３４は、評価部１３３が評価した結果として生成されたリスト（リスト情報１２０）を併合する。装置制御部１３４は、併合したリストを、表記ゆれの候補を示すリストとして、外部のサーバ装置に送信する。 The device control unit 134 controls the proofreading support device 10 in an integrated manner. For example, the device control unit 134 outputs text information indicating the target sentence received by the communication unit 11 to the acquisition unit 130 . The device control unit 134 merges the list (list information 120) generated as a result of evaluation by the evaluation unit 133. FIG. The device control unit 134 transmits the merged list to an external server device as a list indicating candidates for spelling variations.

図３～図８は、実施形態によるリスト情報１２０の例を示す図である。図３～図８には、例えば、リストＮｏ、単語、要素１、要素２…などの項目が示されている。これらの項目は、図１０Ｄの表に示された項目と同様であるためその説明を省略する。 3-8 are diagrams illustrating examples of list information 120 according to embodiments. 3 to 8 show items such as list number, word, element 1, element 2, and so on. Since these items are the same as the items shown in the table of FIG. 10D, the description thereof is omitted.

図３には、連結文章ＡＢに対応して生成されたリストの例が、リスト情報１２０Ａとして示されている。リスト情報１２０Ａには、連結文章ＡＢにおける、６－７文字目に記載された「あり」との単語と、１６－１７文字目に記載された「有り」との単語とが、表記ゆれの候補として抽出された例が示されている。 FIG. 3 shows an example of the list generated corresponding to the concatenated text AB as list information 120A. In the list information 120A, the word "with" described in the 6th and 7th characters and the word "with" described in the 16th and 17th characters in the concatenated sentence AB are candidates for spelling variations. An extracted example is shown.

図４には、連結文章ＡＣに対応して生成されたリストの例が、リスト情報１２０Ｂとして示されている。リスト情報１２０Ｂには、連結文章ＡＣにおける、１－４文字目に記載された「取り扱い」との単語と、２１－２３文字目に記載された「取扱い」との単語とが、表記ゆれの候補として抽出された例が示されている。 FIG. 4 shows an example of the list generated corresponding to the concatenated text AC as list information 120B. In the list information 120B, the word "handling" described in the 1st to 4th characters and the word "handling" described in the 21st to 23rd characters in the concatenated sentence AC are candidates for spelling variations. An extracted example is shown.

図５には、連結文章ＡＤに対応して生成されたリストの例が、リスト情報１２０Ｃとして示されている。リスト情報１２０Ｃには、連結文章ＡＤにおける、１－４文字目に記載された「取り扱い」との単語と、３０－３１文字目に記載された「取扱」との単語とが、表記ゆれの候補として抽出された例が示されている。 FIG. 5 shows an example of the list generated corresponding to the concatenated text AD as list information 120C. In the list information 120C, the word "handling" described in the 1st to 4th characters and the word "handling" described in the 30th to 31st characters in the concatenated sentence AD are candidates for spelling variations. An extracted example is shown.

図６には、連結文章ＢＣに対応して生成されたリストの例が、リスト情報１２０Ｄとして示されている。リスト情報１２０Ｄには、連結文章ＢＣにおける、１１－１４文字目に記載された「取り扱い」との単語と、２１－２３文字目に記載された「取扱い」との単語とが、表記ゆれの候補として抽出された例が示されている。また、リスト情報１２０Ｄには、連結文章ＢＣにおける、１６－１７文字目に記載された「有り」との単語と、２５－２６文字目に記載された「あり」との単語とが、表記ゆれの候補として抽出された例が示されている。 FIG. 6 shows an example of the list generated corresponding to the concatenated text BC as list information 120D. In the list information 120D, the word "handling" described in the 11th to 14th characters and the word "handling" described in the 21st to 23rd characters in the concatenated sentence BC are candidates for spelling variations. An extracted example is shown. Also, in the list information 120D, the word "Yes" described in the 16th and 17th characters and the word "Yes" described in the 25th and 26th characters in the concatenated sentence BC are notation variations. An example extracted as a candidate for is shown.

図７には、連結文章ＢＤに対応して生成されたリストの例が、リスト情報１２０Ｅとして示されている。リスト情報１２０Ｅには、連結文章ＢＤにおける、１１－１４文字目に記載された「取り扱い」との単語と、３０－３１文字目に記載された「取扱」との単語とが、表記ゆれの候補として抽出された例が示されている。 FIG. 7 shows an example of the list generated corresponding to the linked text BD as list information 120E. In the list information 120E, the word "handling" described in the 11th to 14th characters and the word "handling" described in the 30th to 31st characters in the concatenated sentence BD are candidates for spelling variations. An extracted example is shown.

図８には、連結文章ＣＤに対応して生成されたリストの例が、リスト情報１２０Ｆとして示されている。リスト情報１２０Ｆには、連結文章ＣＤにおける、２１－２３文字目に記載された「取扱い」との単語と、３０－３１文字目に記載された「取扱」との単語とが、表記ゆれの候補として抽出された例が示されている。 FIG. 8 shows an example of the list generated corresponding to the concatenated text CD as list information 120F. In the list information 120F, the word "handling" described in the 21st to 23rd characters and the word "handling" described in the 30th to 31st characters in the concatenated sentence CD are candidates for spelling variations. An extracted example is shown.

ここで、装置制御部１３４が、リスト（リスト情報１２０）を併合する方法について説明する。まず、装置制御部１３４は、それぞれのリストにて示された表記ゆれの候補となる単語が記載された位置を特定し、特定した位置が重複するものを併合する。 Here, a method for merging the lists (list information 120) by the device control unit 134 will be described. First, the device control unit 134 identifies the positions where the candidates for the spelling variation shown in each list are described, and merges the words having the duplicate identified positions.

例えば、図３～図８に示すようなリスト情報１２０Ａ～１２０Ｆが生成された場合、装置制御部１３４は、リスト情報１２０Ａにおける「あり」と「有り」の表記ゆれ、及びリスト情報１２０Ｄにおける「有り」と「あり」の表記ゆれについて、それぞれのリストにおいて単語が記載された位置を特定する。 For example, when list information 120A to 120F as shown in FIGS. For the spelling variants of "" and "aru", identify the position where the word is written in each list.

リスト情報１２０Ａにおける「あり」との単語が記載された位置は、６－７文字目である。リスト情報１２０Ａにおける「有り」との単語が記載された位置は、１６－１７文字目である。リスト情報１２０Ｄにおける「有り」との単語が記載された位置は、１６－１７文字目である。リスト情報１２０Ｄにおける「あり」との単語が記載された位置は、２５－２６文字目である。 The position where the word “Yes” is written in the list information 120A is the 6th or 7th character. The position where the word “present” is written in the list information 120A is the 16th to 17th characters. The position where the word “present” is written in the list information 120D is the 16th to 17th characters. The position where the word “Yes” is written in the list information 120D is the 25th to 26th characters.

それぞれのリストにおいて「有り」との単語が記載された位置が、１７－１８文字目で重複するものである。この場合、装置制御部１３４は、リスト情報１２０Ａにおける「あり」と「有り」の表記ゆれ、及びリスト情報１２０Ｄにおける「有り」と「あり」の表記ゆれを、同一グループとみなして併合する。この結果、７－８文字目の「あり」と、１６－１７文字目の「有り」と、２５－２６文字目の「あり」とが、対象文章における表記ゆれの候補となる。 In each list, the position where the word "presence" is described overlaps at the 17th and 18th characters. In this case, the device control unit 134 merges the spelling variations of "yes" and "yes" in the list information 120A and the spelling variations of "yes" and "yes" in the list information 120D as the same group. As a result, the 7th and 8th characters "Ari", the 16th and 17th characters "Ari", and the 25th and 26th characters "Ari" are candidates for notation variation in the target sentence.

また、装置制御部１３４は、リスト情報１２０Ｂにおける「取り扱い」と「取扱い」の表記ゆれ、リスト情報１２０Ｃにおける「取り扱い」と「取扱」の表記ゆれ、リスト情報１２０Ｄにおける「取り扱い」と「取扱い」の表記ゆれ、リスト情報１２０Ｅにおける「取り扱い」と「取扱」の表記ゆれ、及びリスト情報１２０Ｆにおける「取扱い」と「取扱」の表記ゆれについて、それぞれのリストにおいて単語が記載された位置を特定する。 In addition, the device control unit 134 controls the notation variation of "handling" and "handling" in the list information 120B, the notation variation of "handling" and "handling" in the list information 120C, and the notation variation of "handling" and "handling" in the list information 120D. With regard to spelling variations, spelling variations of "handling" and "handling" in the list information 120E, and spelling variations of "handling" and "handling" in the list information 120F, the positions where the words are described in each list are specified.

リスト情報１２０Ｂにおける「取り扱い」との単語が記載された位置は、１－４文字目である。リスト情報１２０Ｂにおける「取扱い」との単語が記載された位置は、２１－２３文字目である。リスト情報１２０Ｃにおける「取り扱い」との単語が記載された位置は、１－４文字目である。リスト情報１２０Ｃにおける「取扱」との単語が記載された位置は、３０－３１文字目である。 The position where the word “handling” is written in the list information 120B is the 1st to 4th characters. The position where the word “handling” is written in the list information 120B is the 21st to 23rd characters. The position where the word “handling” is written in the list information 120C is the 1st to 4th characters. The position where the word “handling” is written in the list information 120C is the 30th to 31st characters.

リスト情報１２０Ｄにおける「取り扱い」との単語が記載された位置は、１１－１４文字目である。リスト情報１２０Ｄにおける「取扱い」との単語が記載された位置は、２１－２３文字目である。リスト情報１２０Ｅにおける「取り扱い」との単語が記載された位置は、１１－１４文字目である。リスト情報１２０Ｅにおける「取扱」との単語が記載された位置は、３０－３１文字目である。 The position where the word “handling” is written in the list information 120D is the 11th to 14th characters. The position where the word “handling” is written in the list information 120D is the 21st to 23rd characters. The position where the word “handling” is written in the list information 120E is the 11th to 14th characters. The position where the word “handling” is written in the list information 120E is the 30th to 31st characters.

リスト情報１２０Ｆにおける「取扱い」との単語が記載された位置は、２１－２３文字目である。リスト情報１２０Ｆにおける「取扱」との単語が記載された位置は、３０－３１文字目である。 The position where the word “handling” in the list information 120F is written is the 21st to 23rd characters. The position where the word “handling” is written in the list information 120F is the 30th to 31st characters.

装置制御部１３４は、それぞれのリストにおいて「取り扱い」との単語が記載された位置が、１－４文字目で重複するものについて併合する。装置制御部１３４は、リスト情報１２０Ｂにおける「取り扱い」と「取扱い」の表記ゆれ、及びリスト情報１２０Ｃにおける「取り扱い」と「取扱」の表記ゆれを、同一グループとみなして併合する。この結果、１－４文字目の「取り扱い」と、２１－２３文字目の「取扱い」と、３０－３１文字目の「取扱」とが、対象文章における表記ゆれの候補となる。 The device control unit 134 merges the lists in which the position where the word “handling” is described overlaps in the 1st to 4th characters. The device control unit 134 merges the notation variation of "handling" and "handling" in the list information 120B and the notation variation of "handling" and "handling" in the list information 120C as the same group. As a result, the 1st to 4th characters "handling", the 21st to 23rd characters "handling", and the 30th to 31st characters "handling" are candidates for notation variations in the target sentence.

また、装置制御部１３４は、それぞれのリストにおいて「取扱い」との単語が記載された位置が、２１－２３文字目で重複するものについて併合する。装置制御部１３４は、リスト情報１２０Ｂにおける「取り扱い」と「取扱い」の表記ゆれ、リスト情報１２０Ｄにおける「取り扱い」と「取扱い」の表記ゆれ、及びリスト情報１２０Ｆにおける「取扱い」と「取扱」の表記ゆれを、同一グループとみなして併合する。この結果、１－４文字目の「取り扱い」と、１１－１４文字目の「取扱い」と、２１－２３文字目の「取扱い」と、３０－３１文字目の「取扱」とが、対象文章における表記ゆれの候補となる。 In addition, the device control unit 134 merges the 21st to 23rd characters in which the position where the word “handling” is written overlaps in each list. The device control unit 134 controls the notation variation of "handling" and "handling" in the list information 120B, the notation variation of "handling" and "handling" in the list information 120D, and the notation variation of "handling" and "handling" in the list information 120F. The shakes are regarded as the same group and merged. As a result, the 1st to 4th characters "handling", the 11th to 14th characters "handling", the 21st to 23rd characters "handling", and the 30th to 31st characters "handling" It is a candidate for notational variation in

また、装置制御部１３４は、それぞれのリストにおいて「取り扱い」との単語が記載された位置が、１１－１４文字目で重複するものについて併合する。装置制御部１３４は、リスト情報１２０Ｄにおける「取り扱い」と「取扱い」の表記ゆれ、及びリスト情報１２０Ｅにおける「取り扱い」と「取扱い」の表記ゆれを、同一グループとみなして併合する。この結果、１１－１４文字目の「取扱い」と、２１－２３文字目の「取扱い」と、３０－３１文字目の「取扱」とが、対象文章における表記ゆれの候補となる。 In addition, the device control unit 134 merges the 11th to 14th characters in which the position where the word “handling” is described overlaps in each list. The device control unit 134 merges the notation variation of "handling" and "handling" in the list information 120D and the notation variation of "handling" and "handling" in the list information 120E as the same group. As a result, the 11th to 14th characters "handling", the 21st to 23rd characters "handling", and the 30th to 31st characters "handling" are candidates for notation variations in the target sentence.

上記より、１－４文字目の「取り扱い」と、２１－２３文字目の「取扱い」と、３０－３１文字目の「取扱」とが、対象文章における表記ゆれの候補となるリスト（第１リスト）が生成される。また、１－４文字目の「取り扱い」と、１１－１４文字目の「取扱い」と、２１－２３文字目の「取扱い」と、３０－３１文字目の「取扱」とが、対象文章における表記ゆれの候補となるリスト（第２リスト）が生成される。１１－１４文字目の「取扱い」と、２１－２３文字目の「取扱い」と、３０－３１文字目の「取扱」とが、対象文章における表記ゆれの候補となるリスト（第３リスト）が生成される。この場合、第１リストから第３リストのそれぞれの要素が互いに重複する。この場合、装置制御部１３４は、第１リストから第３リストを一つのリストに併合する。この結果、１－４文字目の「取り扱い」と、１１－１４文字目の「取扱い」と、２１－２３文字目の「取扱い」と、３０－３１文字目の「取扱」とが、対象文章における表記ゆれの候補となる。 From the above, the 1st to 4th characters "handling", the 21st to 23rd characters "handling", and the 30th to 31st characters "handling" are candidates for spelling variation in the target sentence (first list) is generated. In addition, the 1st to 4th letters “handling”, the 11th to 14th letters “handling”, the 21st to 23rd letters “handling”, and the 30th to 31st letters “handling” are A list (second list) of candidates for spelling variation is generated. The 11th to 14th characters "Handling", the 21st to 23rd characters "Handling", and the 30th to 31st characters "Handling" are candidates for spelling variation in the target sentence (third list). generated. In this case, the respective elements of the first to third lists overlap each other. In this case, the device control unit 134 merges the first to third lists into one list. As a result, the 1st to 4th characters "handling", the 11th to 14th characters "handling", the 21st to 23rd characters "handling", and the 30th to 31st characters "handling" It is a candidate for notational variation in

図９は、実施形態による校正支援装置１０が行う処理の流れを示すフローチャートである。校正支援装置１０は、校正対象（対象文章）を取得し(ステップＳ１０）、取得した対象文章を分割して（ステップＳ１１）分割文章を生成する。校正支援装置１０は、分割文章を連結して（ステップＳ１２）、連結文章を生成する。校正支援装置１０は、連結文章を取得し（ステップＳ１３）、取得した連結文章を評価することによって、連結文章における表記ゆれの候補を抽出する（ステップＳ１４）。校正支援装置１０は、全ての連結文章について評価を行ったか否かを判定し（ステップＳ１５）、まだ評価していない連結文章がある場合にはステップＳ１３に戻る。全ての連結文章について評価を行った場合、校正支援装置１０は、評価に伴って生成されたリスト（リスト情報１２０）について、重複して登録された単語を併合する（ステップＳ１６）。校正支援装置１０は、併合したリストを、表記ゆれの候補を抽出した結果として、例えば、外部のサーバ装置に送信する。 FIG. 9 is a flow chart showing the flow of processing performed by the proofreading support apparatus 10 according to the embodiment. The proofreading support apparatus 10 acquires a proofreading target (target sentence) (step S10), divides the acquired target sentence (step S11), and generates divided sentences. The proofreading support device 10 connects the divided sentences (step S12) to generate a connected sentence. The proofreading support device 10 acquires the concatenated sentence (step S13), and by evaluating the acquired concatenated sentence, extracts candidates for spelling variations in the concatenated sentence (step S14). The proofreading support apparatus 10 determines whether or not all the connected sentences have been evaluated (step S15), and if there are any connected sentences that have not yet been evaluated, the process returns to step S13. When all the connected sentences have been evaluated, the proofreading support device 10 merges words registered in duplicate in the list (list information 120) generated along with the evaluation (step S16). The proofreading support apparatus 10 transmits the merged list to, for example, an external server apparatus as a result of extracting spelling variation candidates.

以上説明したように、実施形態の校正支援装置１０は、取得部１３０と、分割部１３１と、連結部１３２と、評価部１３３とを備える。取得部１３０は、対象文章を取得する。分割部１３１は、対象文章を、少なくとも三つ以上に分割してなる分割文章を生成する。連結部１３２は、分割文章のうち、互いに異なる分割文章を連結してなる連結文章であって、対象文章より短い連結文章を生成する。評価部１３３は、連結文章における表記ゆれの候補となる文字列を抽出する。これにより、実施形態の校正支援装置１０では、対象文章より短い連結文章における表記ゆれの候補となる文字列を抽出することができる。このため、対象文章が長文である場合であっても、対象文章より短い連結文章を校正対象とすることができる。したがって、長文の文章であっても校正に要する時間を増大させることなく、表記ゆれの候補となる文字列を抽出することが可能である。 As described above, the proofreading support apparatus 10 of the embodiment includes the acquisition unit 130 , the division unit 131 , the connection unit 132 and the evaluation unit 133 . Acquisition unit 130 acquires a target sentence. The dividing unit 131 generates divided sentences by dividing the target sentence into at least three or more. The concatenating unit 132 generates a concatenated sentence that is formed by concatenating different divided sentences among the divided sentences and that is shorter than the target sentence. The evaluation unit 133 extracts character strings that are candidates for spelling variation in the concatenated sentence. As a result, the proofreading support apparatus 10 of the embodiment can extract character strings that are candidates for spelling variations in concatenated sentences that are shorter than the target sentence. Therefore, even if the target sentence is long, a connected sentence shorter than the target sentence can be corrected. Therefore, it is possible to extract character strings that are candidates for spelling variation without increasing the time required for proofreading even in long sentences.

また、実施形態の校正支援装置１０では、評価部１３３は、連結文章に含まれる対象単語を、表記ゆれの有無を判定する単語の一覧を示すリスト情報１２０に登録された登録単語と比較する。評価部１３３は、対象単語が登録単語と一致しない又は類似しない場合、対象単語をリスト情報１２０に登録する。評価部１３３は、対象単語が登録単語と類似する場合、対象単語をリスト情報１２０において対象単語に類似する登録単語の要素に追加する。評価部１３３は、リスト情報１２０に登録された登録単語のうち、当該登録単語と当該登録単語の要素に追加された単語を、表記ゆれの候補とする。これにより、実施形態の校正支援装置１０は、連結文章に含まれる対象単語をリスト情報１２０と比較し、一致しない又は類似しない場合に登録し、類似する場合に要素に追加する、という容易な方法にて、表記ゆれの候補を抽出することが可能である。 Further, in the proofreading support apparatus 10 of the embodiment, the evaluation unit 133 compares the target word included in the connected sentence with the registered word registered in the list information 120 showing the list of words for determining the presence or absence of spelling variations. The evaluation unit 133 registers the target word in the list information 120 when the target word does not match or is not similar to the registered word. When the target word is similar to the registered word, the evaluation unit 133 adds the target word to the elements of registered words similar to the target word in the list information 120 . Of the registered words registered in the list information 120, the evaluation unit 133 sets the registered words and the words added to the elements of the registered words as candidates for spelling variation. As a result, the proofreading support apparatus 10 of the embodiment compares the target word included in the connected sentence with the list information 120, registers it if it does not match or is similar, and adds it to the element if it is similar. , it is possible to extract candidates for spelling variation.

また、実施形態の校正支援装置１０では、装置制御部１３４を更に備える。装置制御部１３４は、連結文章のそれぞれに対応して生成されたリスト情報１２０に基づいて、それぞれのリスト情報１２０に登録された登録単語のうち、複数のリスト情報１２０に重複して登録された登録単語を併合させる。これにより、実施形態の校正支援装置１０では、連結文章のそれぞれのリスト情報１２０に基づいて、重複なく、対象文章における表記ゆれを抽出することができる。 Further, the proofreading support device 10 of the embodiment further includes a device control section 134 . Based on the list information 120 generated corresponding to each of the linked sentences, the device control unit 134 selects, among the registered words registered in each of the list information 120, the registered words that are redundantly registered in the plurality of list information 120. Merge registered words. As a result, the proofreading support apparatus 10 of the embodiment can extract spelling variations in the target sentence without duplication based on the list information 120 of each of the linked sentences.

ここで、比較例を考える。評価に係る時間を短縮するための対策として、表記ゆれが発生しやすい単語のリストを用意し、リストに登録された単語のみを表記ゆれがないかチェックすることで校正の高速化を図ることが考えられる。しかし、この方法では予めリストに登録されていない単語の表記ゆれをチェックすることができない。このため、表記ゆれを見逃してしまう可能性がある。 Now consider a comparative example. As a measure to reduce the time required for evaluation, it is possible to prepare a list of words that are likely to have spelling variations, and check only the words registered in the list for spelling variations to speed up proofreading. Conceivable. However, this method cannot check spelling variations of words that are not registered in the list in advance. For this reason, there is a possibility of overlooking notation variations.

これに対し、本実施形態では、連結文章に対応するリスト情報１２０を生成する。リスト情報１２０は、連結文章に記載された単語が、他の単語と一致するか否か、類似するか否かに応じて作成される。このため、対応する連結文章を生成する。したがって、表記ゆれを見逃してしまう可能性を低減させて表記ゆれを見逃すリスクを低減させることが可能である。 On the other hand, in this embodiment, list information 120 corresponding to connected sentences is generated. The list information 120 is created according to whether the word described in the connected sentence matches or is similar to another word. Therefore, a corresponding connected sentence is generated. Therefore, it is possible to reduce the possibility of overlooking spelling variations and reduce the risk of overlooking spelling variations.

また、実施形態の校正支援装置１０では、連結部１３２は、互いに異なる二つの分割文章における全ての組合せに対応する連結文章を生成する。これにより、実施形態の校正支援装置１０では、全ての分割文章について、一方の分割文章において統一された表記が、他方の分割文章における表記ゆれに該当するような場合であっても、互いの表記ゆれを抽出することができ、表記ゆれを見逃すリスクを低減させることが可能である。 In addition, in the proofreading support apparatus 10 of the embodiment, the linking unit 132 generates linked sentences corresponding to all combinations of two divided sentences that are different from each other. As a result, with the proofreading support apparatus 10 of the embodiment, even if the notation standardized in one of the divided sentences corresponds to the notation variation in the other divided sentence, Variation can be extracted, and the risk of overlooking notation variation can be reduced.

また、実施形態の校正支援装置１０では、評価部１３３は、連結文章のそれぞれについて、表記ゆれの候補となる文字列を抽出する処理を、並列に実行する。これにより、実施形態の校正支援装置１０では、評価に要する時間を短縮させることが可能である。 Further, in the proofreading support apparatus 10 of the embodiment, the evaluation unit 133 executes, in parallel, a process of extracting character strings that are candidates for spelling variation for each of the concatenated sentences. As a result, the proofreading support apparatus 10 of the embodiment can shorten the time required for evaluation.

また、実施形態の校正支援装置１０では、分割部１３１は、対象文章における句点、記号又は読点のいずれかを境界として、前記対象文章を分割する。これにより、実施形態の校正支援装置１０では、単語の途中に境界が設定されることがなく、表記ゆれを見逃すリスクを低減させることが可能である。 In addition, in the proofreading support apparatus 10 of the embodiment, the dividing unit 131 divides the target sentence using any of the periods, symbols, and commas in the target sentence as boundaries. As a result, in the proofreading support apparatus 10 of the embodiment, boundaries are not set in the middle of words, and the risk of overlooking spelling variations can be reduced.

上述した実施形態における校正支援装置１０の全部または一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 All or part of the proofreading support apparatus 10 in the above-described embodiment may be implemented by a computer. In that case, a program for realizing this function may be recorded in a computer-readable recording medium, and the program recorded in this recording medium may be read into a computer system and executed. It should be noted that the "computer system" referred to here includes hardware such as an OS and peripheral devices. The term "computer-readable recording medium" refers to portable media such as flexible discs, magneto-optical discs, ROMs and CD-ROMs, and storage devices such as hard discs incorporated in computer systems. Furthermore, "computer-readable recording medium" means a medium that dynamically retains a program for a short period of time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include something that holds the program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or client in that case. Further, the program may be for realizing a part of the functions described above, or may be capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be implemented using a programmable logic device such as FPGA.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiment of the present invention has been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and design and the like are included within the scope of the gist of the present invention.

１０…校正支援装置
１３０…取得部
１３１…分割部
１３２…連結部
１３３…評価部 DESCRIPTION OF SYMBOLS 10... Proofreading support apparatus 130... Acquisition part 131... Division part 132... Connection part 133... Evaluation part

Claims

an obtaining unit that obtains a target sentence to be proofread; a dividing unit that generates divided sentences by dividing the target sentence into at least three or more;
a connecting unit for generating a connected sentence formed by connecting different divided sentences among the divided sentences, the connected sentence being shorter than the target sentence;
an evaluation unit that extracts a character string that is a candidate for spelling variation in the concatenated sentence;
A calibration support device.

The evaluation unit
Comparing the target word contained in the linked sentence with the registered word registered in the list information indicating the list of words for determining the presence or absence of spelling variations,
if the target word does not match or is similar to the registered word, registering the target word in the list information;
when the target word is similar to the registered word, adding the target word to the elements of the registered word similar to the target word in the list information;
The calibration support device according to claim 1.

The evaluation unit regards, among the registered words registered in the list information, the registered words and the words added to the elements of the registered words as candidates for spelling variations,
The calibration support device according to claim 2.

Based on the list information generated corresponding to each of the linked sentences, among the registered words registered in each list information, the registered words duplicately registered in a plurality of the list information are merged. further comprising a device control unit that causes
The calibration support device according to claim 3.

The connecting unit generates the connected sentences corresponding to all combinations of the two divided sentences that are different from each other.
The proofreading support device according to any one of claims 1 to 4.

The evaluation unit executes, in parallel, a process of extracting character strings that are candidates for spelling variation for each of the concatenated sentences.
The proofreading support device according to any one of claims 1 to 5.

The dividing unit divides the target sentence using any one of a period, a symbol, or a comma in the target sentence as a boundary.
The proofreading support device according to any one of claims 1 to 6.

A proofreading support method performed by a computer, wherein an acquisition unit acquires a target sentence to be proofread,
The dividing unit generates divided sentences by dividing the target sentence into at least three or more,
generating a connected sentence in which the connecting part is a connected sentence formed by connecting different divided sentences among the divided sentences, the connected sentence being shorter than the target sentence;
The evaluation unit extracts a character string that is a candidate for spelling variation in the concatenated sentence;
Proofreading assistance method.

A program for causing a computer to operate as the proofreading support device according to any one of claims 1 to 7, the program causing the computer to function as each unit provided in the proofreading support device.