[go: up one dir, main page]

WO2004053724A1 - Dispositif de conversion de donnees, procede de conversion de donnees, et support d'enregistrement comprenant un programme de conversion de donnees - Google Patents

Dispositif de conversion de donnees, procede de conversion de donnees, et support d'enregistrement comprenant un programme de conversion de donnees Download PDF

Info

Publication number
WO2004053724A1
WO2004053724A1 PCT/JP2003/015565 JP0315565W WO2004053724A1 WO 2004053724 A1 WO2004053724 A1 WO 2004053724A1 JP 0315565 W JP0315565 W JP 0315565W WO 2004053724 A1 WO2004053724 A1 WO 2004053724A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
block
information
format
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2003/015565
Other languages
English (en)
Japanese (ja)
Inventor
Yuko Kanemoto
Hideaki Tanaka
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sharp Corp
Original Assignee
Sharp Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sharp Corp filed Critical Sharp Corp
Priority to AU2003289192A priority Critical patent/AU2003289192A1/en
Publication of WO2004053724A1 publication Critical patent/WO2004053724A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/174Form filling; Merging

Definitions

  • the present invention relates to a data conversion apparatus, a data conversion method, and a recording medium on which a data conversion program is recorded.
  • the present invention relates to a static information including character information and character position information that is information on a display position of the character information.
  • the present invention relates to a data conversion device, a data conversion method, and a recording medium that records a data conversion program for converting dynamic document data into dynamic document data that does not include positional information of the character.
  • Representative examples of the former include a document file of DTP (Desktop Publishing) software and a PDF (Portable Document Format) file.
  • Typical examples of the latter are plain text files (files containing only text) and HTML (Hypertext Markup Language) files.
  • the former is called a static document format, and the latter is called a dynamic document format.
  • the document shown in Figure 25A is a document in which katakana fonts are in bold and are blocked at font switching points.
  • FIG. 25B and FIG. 26B show a display screen of a document in a static document format for English, and FIG. The sequence of blocks in the static document of 5B is shown.
  • the document shown in Figure 25B is a document in which the font of the word “document” is in a polled font and is blocked at the point where the font is switched.
  • there are 18 blocks in the document from block 1 to block 18, in which the character arrangement is correct.
  • the order of the blocks is not in order, so even if the characters are arranged in the order of the blocks, the sentence will not be the correct answer.
  • Document files in the dynamic document format that do not have location information are arranged and displayed by a file-specific browser (browsing software) based on certain rules with characters and images in the document.
  • a certain nore is, for example, a rule for arranging characters in order in the case of plain text and performing a line feed based on a line feed code, and a rule for HTML in an arrangement rule based on each tag.
  • PDAs Personal Digital Assistants
  • PDCs Personal Digital Cellular
  • the screen resolution of a mobile terminal is smaller than that of a personal computer, and it is not suitable for displaying a static document format. For this reason, mobile documents are expected to be mainly and dynamically used in the dynamic document format. Therefore, in the future, as the platform for browsing electronic documents shifts to mobile terminals, conversion technology for converting information in the static document format to the dynamic document format with high efficiency will be important.
  • the electronic filling system disclosed in the above publication is a system that converts an electronic document into data of the electronic filling system based on information transmitted from an application to a printer driver.
  • the system performs data conversion by performing an operation of “printing” the relevant document on a printable application using a conversion printer driver.
  • the data passed to the printer driver when the application prints is GDI (Graphics Device Interface) Is converted to a unified format. Therefore, in the above-mentioned electronic filing system, by preparing a printer driver that interprets common GDI commands and performs data conversion, it can absorb many differences in digitized document formats and perform many-to-one data conversion. Has been realized.
  • the conventional technology such as the electronic filing system disclosed in the above publication has a problem that the character code of the data passed to the printer driver does not form a series of sentences, especially in a specific application such as DTP software. there were.
  • the DTP software includes the expression of sentences that exceed the expressive ability of the GDI code, such as drawing one character at a time.
  • the present invention has been made in view of these problems, and a data conversion device, a data conversion method, and a data conversion method capable of realizing high-precision conversion from static document format data to dynamic document format data.
  • Another object of the present invention is to provide a recording medium on which a data conversion program is recorded.
  • a data conversion device converts document data in a first format including character information and character position information that is information on a display position of the character information into a second format document data not including the character position information.
  • This is a data conversion device that converts document data in the format of. Note that at least one document data of the first format A plurality of blocks including the character information and character position information of the character information.
  • the data conversion device for each block of the document data of the first format, based on the position information of the character included in each block, the first A block area determining unit for determining a block area in which character information of each block is displayed in an area when document data of the format is displayed in a mode corresponding to the first format; and Based on the relationship between the block areas determined for each block, the blocks should be connected by a block-to-block area character connection determination unit that determines blocks to be connected, and a block-to-block area character connection determination unit.
  • a document creation unit for creating document data of the second format by linking the blocks determined to be That. '
  • the blocks are connected and converted into document data not including the position information.
  • This makes it possible to obtain a dynamic document from data in a static document format with higher accuracy than recognizing a character string using a threshold for the size of characters to be displayed, such as character height 1 to 2. Conversion to data in the format can be performed.
  • a line dividing unit that divides character information in units of lines, and, in each line divided by the line dividing unit, A line feed position determining unit that determines whether line feed information should be inserted immediately before the line in the second format document data created by the document creating unit based on the character information at the beginning of the line. Is preferred.
  • the inter-block-area character-connection determining unit determines that the terminal character in the first block area and the first character in the second block area different from the first block are different. And the first condition that the first character in the first block area and the last character in the second block area be on the same line is satisfied, and Between the last character in the first block area and the first character in the second block area on the same line, or The second condition is satisfied that there is no third block area different from the first and second blocks between the first character in the block area and the last character in the second block area. In this case, it is preferable to determine that the blocks corresponding to the first and second block regions should be connected.
  • the data conversion method includes: converting document data of a first format including character information and character position information that is information on a display position of the character information into a second format document data that does not include the character position information; This is a data conversion method for converting to document data of the following format.
  • the data conversion method for each block of the document data of the first format, based on the position information of the character included in each block, the first format Determining a block area in which character information of each block is displayed in an area when the document data of the corresponding block is displayed in a mode corresponding to the first format; and Determining a block to be connected for each block based on the relationship between the block regions, and connecting the blocks determined to be connected to each other to obtain a document in the second format. And a step of creating data.
  • the document data of the first format includes a plurality of blocks including at least one character information and character position information of the character information.
  • the recording medium on which the data conversion program according to the present invention has been recorded stores document information in a first format including character information and character position information that is information on a display position of the character information.
  • the data conversion program recorded on the recording medium sends to the computer, for each block of the document data of the first format, the first data based on the character position information of the character included in each block.
  • the document data in the first format includes a plurality of blocks including at least one character information and character position information of the character information.
  • FIG. 1 is a block diagram showing a specific example of a configuration of a personal computer (PC), an input unit, an external storage device, and a display unit, which are document processing apparatuses according to the first embodiment of the present invention.
  • PC personal computer
  • input unit an input unit
  • external storage device an external storage device
  • display unit which are document processing apparatuses according to the first embodiment of the present invention.
  • FIG. 2 is a control block diagram showing the configuration of the personal computer, input unit, external storage device, and display unit of FIG.
  • FIG. 3 is a flowchart of the conversion process performed by the personal computer of FIG.
  • FIG. 4 is a diagram schematically showing a part of character information and detailed information of the character information stored in the character information buffer of FIG.
  • FIG. 5 is a diagram schematically showing information stored in the block area buffer of FIG.
  • FIG. 6 is a flowchart of a subroutine of a block area determination process in FIG.
  • Fig. 7 shows the static document data shown in Fig. 25A by the block area determination process in Fig. 6 together with the block area names determined for all the blocks included in the static document data. It is the figure which was described typically.
  • FIG. 8 is a flowchart of a subroutine of a character connection determination process between block areas in FIG.
  • FIG. 9 is a flowchart of a subroutine of a process for creating a correct document in FIG.
  • FIG. 10 is a diagram showing an example of static document data to be subjected to data conversion according to the first embodiment of the present invention.
  • FIG. 11 shows the static document data shown in FIG. 10 according to the first embodiment of the present invention.
  • FIG. 14 is a diagram showing dynamic document data after data conversion according to the 2003/015565 mode.
  • FIG. 12 is a control block diagram illustrating a configuration of a personal computer, an input unit, an external storage device, and a display unit according to the second embodiment of the present invention.
  • FIG. 13 is a flowchart of the conversion process performed by the personal computer of FIG.
  • FIG. 14 is a flowchart of a subroutine of the merging process of the block areas in FIG.
  • FIG. 15 is a flowchart of a subroutine of the line division processing of FIG.
  • FIG. 16 is a flowchart of a subroutine of the line feed position determination processing of FIG.
  • FIG. 17 is a diagram schematically illustrating an example of a block state after the block area merging processing of FIG. 14 is performed.
  • FIG. 18 is a diagram schematically illustrating another example of the state of the block after the block area merging processing in FIG. 14 is performed.
  • FIG. 19 is a flowchart of a subroutine of the process for creating a correct answer document with line feed shown in FIG.
  • FIG. 20 is a diagram illustrating a display example of a static document targeted in the document processing device according to the third embodiment of the present invention.
  • FIG. 21 is a flowchart of a process executed by the document processing apparatus according to the third embodiment of the present invention.
  • FIG. 21 shows a modified example of a part of the correct document creation process of FIG. It is a flowchart shown.
  • FIG. 22A shows the contents displayed in FIG. 20 together with the coordinates of the block area when it is determined that the block area (k) is inside the block area (i).
  • FIGS. 22B and 22C are diagrams showing screens as a result of converting the static document shown in FIG. 22A into a dynamic document.
  • FIG. 23A shows the contents displayed in FIG. 20 together with the coordinates of the block area when it is determined that the block area (k) is at the lower boundary of the block area (i).
  • FIG. 2003/015565 FIGS. 23B and 23C are diagrams showing screens as a result of converting the static document shown in FIG. 23A into a dynamic document.
  • FIG. 24 is a diagram showing a further example of a static document that can be converted into a dynamic document in the present invention.
  • FIGS. 25A and 25B are diagrams showing specific examples of the static document to which the present invention is applied.
  • FIGS. 26A and 26B are diagrams schematically showing the arrangement of the blocks in the static document shown in FIGS. 25A and 25B.
  • FIG. 27 is a diagram illustrating a result of performing a row extraction process on the static document illustrated in FIG. 25A by a conventional method.
  • FIG. 28 is a diagram illustrating a result when sentences are extracted from the static document illustrated in FIG. 25A in the order in which character strings are arranged in the document by a conventional method.
  • PC 1 is operated based on information input to input section 103.
  • the input unit 103 includes a mouse, a keyboard, and the like, and receives various instructions from a user.
  • the PC 1 is connected to the external storage device 102.
  • the external storage device 102 can read and write information recorded on recording media such as FD (flexible disk) and HD (hard disk).
  • the PC 1 can operate according to a program recorded on a recording medium and read by the external storage device 102.
  • the external storage device 102 reads information on a recording medium that stores static document data (data of a document in a static document format) handled by the PC 1. Further, the dynamic document data (data of a document in a dynamic document format) converted from the static document data by the PC 1 can be recorded on a recording medium.
  • the programs and data recorded on the recording medium handled by the external storage device 102 are loaded into the program memory 105 and the data memory 106 on the PC 1 and loaded into the CPU (Central Processing Unit) 104. Under the control of the display control unit 101, is displayed on the display unit 107 under the control of the display control unit 101, or is stored in an external storage device.
  • the CPU Central Processing Unit
  • PC 1 shown in FIG. 1 is a configuration of a general personal computer, and the configuration of PC 1 is not limited to the configuration shown in FIG. Further, in the present embodiment, the description will be made assuming that the document processing apparatus is a personal computer whose configuration is shown in FIG.
  • a mobile terminal such as A (Personal Digital Assistants) may be used.
  • the input unit 103, the external storage device 102, and the Z or the display unit 107 may be directly connected to the PC 1, or may be connected via a network. Further, a device having the same function as these may be built in the PC 1.
  • conversion processing for converting static document data into dynamic document data that is, data in a document format including the position information of each object such as characters and images (static document data) Data)
  • a conversion process is performed to convert the data into document format data (dynamic document data) that does not include location information.
  • input unit 103 receives an input of static document data 10 from external storage device 102, and based on this, CPU 104 stores data in data memory 106. Temporarily save the entered data.
  • the program memory 105 detects the character information extraction unit 1011, the block area determination unit 101, and the block. Character link between areas determination unit 1 0 1 3 and correct answer document creation unit 1 0 1 4 should be controlled in conjunction with character information buffer 1 0 2 1 and block area buffer 1 0 2 2 of data memory 106 Converts static document data 10 into dynamic document data Convert to 2 0.
  • the converted dynamic document data 20 is output to the display unit 107 or the external storage device 102 under the control of the display control unit 101.
  • the character information extraction unit 101, block area determination unit 101, block area sensitive character connection determination unit 101, and correct answer document creation unit 101 Included in 5.
  • the process shown in FIG. 3 is realized by reading and executing the program stored in the CPU 104 of the PC 1 in the program memory 105.
  • the conversion processing shown in FIG. 3 will be described with reference to the block diagram of FIG.
  • the PC 1 accepts the input of the static document data 10 from the external storage device 102 in the input section 103 in step S 10 (hereinafter, the steps are omitted).
  • the static document data 10 stores attribute information such as vertical / horizontal writing information, font name, underline, italic, bold or character color, and background color, in addition to the position information described above. Is the data that has been
  • the character information extracting unit 101 extracts character information of each character from the input static document data 10 (S11).
  • the method of extracting character information in the character information extracting unit 101 is similar to that of a general printer driver.
  • a method of extracting by interpreting data in a common format converted by a kernel module such as GDI may be used, a method of extracting character information by directly interpreting static document data 10 may be used. It may be a method. The method of extracting character information here can be determined arbitrarily.
  • the character information extracted in S11 is stored in the character information buffer 102.
  • the block area determination unit 101 determines a block area based on the character information extracted in S11 (S12). The method of determining the block area in S12 will be described later in detail.
  • the inter-block-area character connection determination unit 101 determines which block area can be connected to a character in the block area determined in S12 (S13). 13 ). The method of determining character connection between block areas in S13 will be described later in detail.
  • a correct document is created by connecting the characters between the check areas (S14). The method of creating a positive document in S14 will be described later in detail.
  • the dynamic document data 20 including the correct document created in S14 is output to the display unit 107 or the external storage device 102 (S15).
  • character information was extracted from the static document data 10.
  • information stored in the character information buffer 1021 including the character information, will be described with reference to FIG.
  • FIG. 4 schematically shows a part of the information stored in the character information buffer 1021, in which the character information and the detailed information of the character information (information details A and B) are described. Including.
  • a character string is extracted from static document data stored in the external storage device 102 in the order of blocks that are block-formed at least in units where characters are arranged correctly and continuously.
  • the extracted character strings are numbered as blocks (1 2), blocks (5), etc., and all character information buffers from the first character string (block (1)) to the final character string are written. Stored in 1 0 2 1
  • Each block has font information 11 and attribute information 12 as information common to all characters in the block (block common information), as shown in FIG. 4 as information details A and information details B. . Further, each character in each block has a character code 13 and a start coordinate end coordinate 14 as detailed character information given independently of the other characters.
  • Each block is a character string having common information such as font information and attribute information. Within each block, the letters are correctly aligned. However, in the character information buffer 1021, the blocks are not aligned properly. For this reason, as shown in FIG. 4, even if blocks in the character information buffer 1021 are extracted and the character strings (blocks) are arranged as they are, for example, the first block The text is lined up, so it is not a positive sentence.
  • the structure of the text information is shown in Fig. 4 as Information Detail A and Information Detail B. It is not limited to such.
  • the font information 11 may be the name of the font information itself or the address of a table indicating the font information. Which one to adopt can be determined arbitrarily.
  • the format is such that font information is stored for each character.
  • the attribute information 12 is information on the attribute of each character, and corresponds to attribute information such as vertical writing Z horizontal writing information, underline, italic, port, character color, and background color.
  • FIG. 5 is a diagram schematically showing information stored in the block area buffer 102 provided for storing information of the block area determined in the block area determination processing.
  • the block area buffer 102 includes a plurality of block areas such as areas 102 A to 102 C, 102 N, and the like. Each block area includes a pointer to each block of the character information buffer 1021, and a start coordinate Z end coordinate of the block area 102. The pointer and coordinates of each block area are set by the block area determination processing.
  • FIG. 6 shows a flowchart of a subroutine of the block area determination processing (S12).
  • Blocks inconvenient for determination are excluded from character elements in accordance with a predetermined rule (S20).
  • Blocks that are inconvenient to judge include, for example, blocks that include characters or images that are too large to be considered as characters or components included in blocks.
  • block variable i and “character variable; N” are used. Then, 0 is substituted for the block variable i (S 21). This is because
  • a pointer to the block (i) of the character information buffer 1021 is set in the block area (i) (S22).
  • block (i) is added to block area (i).
  • Setting a pointer to means that one block area (block area (i)) of a plurality of block areas (block area (i)) as shown in FIG. 5 as areas 1022A to 1022C and 1022N. ) (To be a “pointer to block (i)” according to the state described in Fig. 5). This makes it possible to associate the block (i) with the block area (i). That is, in the present embodiment, blocks to which the same numbers are assigned are associated with block regions (block (0) and block region (0), block (1) and block region (1), etc.). .
  • i and j are numbers for specifying the block area and the characters in the block area, respectively.
  • the j-th character is set as character variable j.
  • the processing of S26 is repeated for each block the number of times corresponding to the number of characters included in the block, so that the block area has a size corresponding to the number of characters included in the corresponding block. It will be.
  • FIG. 7 shows that the static document data shown in FIG. 25A is applied to all the blocks included in the static document data by the above-described block area determination processing.
  • FIG. 7 is a diagram schematically illustrating the block area names determined in this way. The figure
  • each block area is surrounded by a line.
  • Each block area is surrounded by a rectangle of a different line type such as a dotted line, a dashed line, a solid line, or a dashed line so as to be distinguished from other adjacent block areas.
  • the determined block area may overlap with another block area.
  • the present invention utilizes this to extend the connection of correct characters.
  • the block area variable i is set to 0 (S30). This is for considering combinations of all block areas.
  • i is substituted for the block area variable j (S31), and 1 is added to j (S32). If j is less than the number of block areas (YES in S33), "the last character of the block area (i) and the first character of the block area (j), or the first character of the block area (i) and It is determined whether or not "the last character of the block area (j)", "adjacent on the same line, and there is no other block area between the characters" (S34). If the condition is satisfied ( ⁇ 3 in 334), the block area (i) and the block area (j) are recorded as a concatenation of the character sequence (S35). On the other hand, if the condition in S34 is not satisfied, the process returns to S32.
  • a block area (i) serving as a start of the document is searched (S40).
  • the block area at the leftmost top is used.
  • the block area at the rightmost top is the block area that starts the document.
  • the character string in the block area i is extracted (S41).
  • To extract the character string of the block area (i) means to grasp the area of the character belonging to the block (i) of the block area (i). This processing is executed to use these pieces of position information in the processing after 42.
  • the process of searching for a block U) which is a document that will follow the block region (i) is, for example, a process in which the document to be processed is a horizontal document in the remaining block regions in which the character strings have not been extracted yet.
  • To find the leftmost block area find the leftmost block area if there are multiple leftmost ones, or find the rightmost block area for a vertical document, and find the rightmost block area If there is, the process is to find the best block area.
  • the rightmost topmost block area can be searched for by comparing the entire text (the entire block) of the static document currently being processed by comparing the leftmost topmost or rightmost topmost position of each block. it can.
  • the character sequence starts from the top left and ends at the bottom right for horizontal writing, and starts at the top right and ends at the bottom left for vertical writing. This is because, when the continuity of the block is inferred, it is necessary to find the leftmost top block area or the rightmost top block area.
  • the character string included in the block area (j) found in S42 or S43 is extracted (S44), and the character strings of the block area (i) and the block area (j) are concatenated (S45). .
  • the character strings of the two block areas are concatenated. Specifically, for example, the character string of the block area 1 in FIG. 25A “FIG. 1 shows the structure of the present invention” and the block area (2) When the character strings “blocks” are connected, the character strings are connected such that “FIG. 1 is a block showing the configuration of the present invention”.
  • the processing is performed in S47.
  • a static document can be converted to a dynamic document.
  • the static document data shown in FIG. 10 is converted into dynamic document data that can be displayed in the small display unit 201 of the portable terminal 200 in FIG. Is done.
  • line feed information can be added to dynamic document data generated by conversion from static document data.
  • static document data contains the position information of the object, it does not include information on where the line breaks and when the line break occurs. Therefore, when creating dynamic document data that does not include location information, it is necessary to add line feed data. Therefore, a process for converting the static document data into the dynamic document data with the line feed information added will be described below.
  • PC 1 of the present embodiment is different from PC 1 of the first embodiment shown in FIG. 15, a line dividing section 101, a line feed position determining section 101, and a correct answer document creating section 110 with a line feed.
  • S53 it is determined whether or not j is less than the number of block areas. If it is less than the number of block areas (YE S in S53), in S54, the block area i and the block area in S35 (see Fig. 8) of the character concatenation determination processing between block areas are arranged with characters. It is determined whether there is a record determined to be linked by. If it is determined that there is such a record ( ⁇ £ 3 at 354), the block area (i) and the block area ⁇ ) are merged (S55), and after merging, the buffer in the block area (j) is deleted. You. On the other hand, if it is determined that there is no such record (NO in S54), the process returns to S52.
  • the block area variable i is set to 0 in S60. This is for performing the line division processing for all the block areas.
  • the block area variable i is set to 0. This is for determining the line feed position for all the block areas.
  • the investigation target is set to the i-th block area.
  • the investigation target means the processing target of S72 to S82 described later.
  • the average position P of the first character of the line in the block area (i) is determined in the line direction.
  • the line direction is the direction in which the characters are arranged in the document, that is, the horizontal direction when the document is written horizontally, and the vertical direction when the document is written vertically.
  • the average value ⁇ of the character size in the block area (i) is obtained in the line direction. The average value ⁇ is calculated from the character size obtained from the detailed information shown as detailed information ⁇ and detailed information ⁇ in FIG.
  • the line variable j of the block area (i) is set to 1 in order to perform the line break position determination processing for all the rows after the second row of the block area (i) (S74).
  • the target is the j-th row (S75).
  • the position of the average value P at the beginning of the block of the block is considered to be below the center of the beginning of the first line and the beginning of the line with the smallest line.
  • the beginning of the sentence should be after the position of P plus half the average value of the character size.
  • FIG. 17 shows the state of a certain block after performing the block area merging process.
  • the dotted lines in FIG. 17 are not actually displayed, but are shown for the sake of convenience in order to make the description of the figure easier to understand.
  • “0”, “2”, “10”, “8”, “4”, and the like described in FIG. 17 are numbers indicating the character size, and are not actually displayed.
  • the meaning of the description of the dotted lines and numerals is the same for FIG. 18 described later.
  • the number of characters in the block shown in Fig. 17 is 16, and the character size is 8, 8, 4, 6, 8, 8, 8, 6, 8, 8, 7, 8, 7, 7, 7, respectively.
  • the average character size ⁇ is (8 + 8 + 4 + 6 + 8 + 8 + 8 + 6 + 8 + 8 + 7 + 8 + 7 + 7 + 7 + 4) / 16- 7 is calculated.
  • the beginning of the sentence may be shifted one character before, as shown in FIG.
  • the position of the average value P at the beginning of the line will be the beginning of the first line and the beginning of the line excluding the first line. Minimum position Will be before the center of the beginning of the line.
  • the foremost part of the character at the beginning of the sentence should exist before the position obtained by subtracting a half of the average value ⁇ of the character size from P.
  • a block area (i) serving as a start of the document is searched (S90).
  • the top leftmost block area is set as the start block when the document is written vertically.
  • the row variable n of the block area (i) is set to 0. It is necessary to create a correct answer document with line breaks for all lines in the block area (i).
  • the rear end of the previous line in the line subject to the connection is the line feed insertion position recorded in the line feed position determination processing (S18: see FIG. 13). Is determined. If it is determined that it is the line feed position, Line feed information is inserted (S94), and the process proceeds to S95. On the other hand, if it is determined that the position is not a line feed / position, the process proceeds directly to S95.
  • n is less than the number of rows in the block area i. If n is less than the number of rows in the block area i (Y E S in S 96), the S 92 processing is returned. On the other hand, if n has reached the number of rows of the block area i, a block area (j) in which a document will follow the block area (i) is searched (S97).
  • the block (j) where the document will follow the block area (i) is, for example, the leftmost top when the document is horizontal, of the remaining block areas from which the character string has not been extracted yet. In the case of vertical writing, it is the rightmost top block area.
  • S98 the same processing as the above-described processing of S91 to S96 (processing in the area surrounded by the broken line W in FIG. 19) is performed. However, in each process of S91 to S96 executed as S98, i is replaced by; i.
  • the present invention can be applied by performing a process such as a statement.
  • the method described in the present embodiment is used when searching for a block area (j) which is a document that will follow the block area (i) in the process S43 of generating the correct document described with reference to FIG.
  • the method for finding the uppermost block area in the leftmost block area described in, is slightly modified.
  • the block region (i) ((X1 (i) mi11, Y1 (i) mi ⁇ ,), (XI (i) max, Y1 (i) max,)) It is determined whether another block area exists inside. This determination means that the upper left position (X 1 (k) min, Y 1 (k) min) of another block area (k) is the upper left position (X 1 (i) min, Y 1) of the block area (i). It exists at the lower right of ( ⁇ ) min,), and the lower right position (Xl (k) max, Yl (k) max) of the block region (k) is the upper left position of the block region (i). (Xl (i) max, Yl (i) max,) It is executed by checking whether it exists at the upper left.
  • the block area (k) is blocked in step S113 by a document that will follow the block area (i).
  • the area is determined as the area (j), the processing ends, and the flow returns to S44 in FIG.
  • the image information is extracted, and in S45, the image information of the block area (j) is linked after the block area (i).
  • the areas indicated by areas 2001 and 2002 as shown in FIG. 22A are a block area (1) and a block area (2), respectively, the block area ( Since 2) exists inside the block area (1), it is determined that the block area (k) exists in S110 described above.
  • the display of the display unit 107 has a layout as shown in FIG. 23A, for example. it is conceivable that.
  • the block area (1) considered as the block area (i) and the further block area (3) It is considered that the layout exists so as to straddle.
  • the block area (1) is shown as an area 2001
  • the block area (3) is shown as an area 2003.
  • the block area (k) is determined to be a block area (j) which is a document that will follow the block area (i), and the processing is terminated, and the process returns to S44 in FIG. .
  • the image information is extracted, and in S45, the image information in the block area (j) is linked to the image information in the block area (j).
  • the leftmost block area is searched in S 1 12. Finds the best block area in it, determines it as the block area (j), and ends the processing.
  • FIG. 24 in the area 20001 (block area (1)), as the area 2002 (block area (2)), annotations and columns are inserted instead of illustrations. It is needless to say that the same process can be used to convert to a correct dynamic document. However, also in FIG. 24, the point f is not actually displayed, and is described above for convenience of explanation of the figure. In Figure 24, annotations and columns are described in the area 2002 with the intention of existing as a document expanded on a bitmap, and exist as a text document. It is not intended to be described.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

La présente invention a trait à un dispositif de conversion de données, dans lequel d'abord à l'étape S10 une entrée de données de document statique est reçue en provenance d'un dispositif de stockage externe. Ensuite, à l'étape S11, une information de caractère concernant chaque caractère est extraite des données de document statique qui a été saisie. A l'étape S12, une zone de bloc est déterminée selon l'information de caractère extraite à l'étape S11. Ensuite, à l'étape S13, pour le bloc déterminé à l'étape S12, on détermine avec quelle zone de bloc des caractères peuvent être enchaînés. Ensuite, à l'étape S14, les caractères sont enchaînés entre les zones de bloc déterminées à l'étape S13, créant ainsi un document de réponse correcte. A l'étape S15, des données de document dynamique comprenant le document de réponse correcte créé à l'étape S14 est émis en sortie vers une unité d'affichage ou vers le dispositif de stockage externe.
PCT/JP2003/015565 2002-12-06 2003-12-04 Dispositif de conversion de donnees, procede de conversion de donnees, et support d'enregistrement comprenant un programme de conversion de donnees Ceased WO2004053724A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003289192A AU2003289192A1 (en) 2002-12-06 2003-12-04 Data conversion device, data conversion method, and recording medium containing data conversion program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2002/355111 2002-12-06
JP2002355111 2002-12-06

Publications (1)

Publication Number Publication Date
WO2004053724A1 true WO2004053724A1 (fr) 2004-06-24

Family

ID=32500779

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2003/015565 Ceased WO2004053724A1 (fr) 2002-12-06 2003-12-04 Dispositif de conversion de donnees, procede de conversion de donnees, et support d'enregistrement comprenant un programme de conversion de donnees

Country Status (2)

Country Link
AU (1) AU2003289192A1 (fr)
WO (1) WO2004053724A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210303790A1 (en) * 2020-03-27 2021-09-30 Fujifilm Business Innovation Corp. Information processing apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0424782A (ja) * 1990-05-15 1992-01-28 Canon Inc 文書処理装置
JP2000293521A (ja) * 1999-04-09 2000-10-20 Canon Inc 画像処理方法、装置及び記憶媒体
JP2002526862A (ja) * 1998-10-01 2002-08-20 ビーシーエル コンピューターズ, インコーポレイテッド ドキュメントを表わすデータの操作および表示のための他のフォーマットへの変換

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0424782A (ja) * 1990-05-15 1992-01-28 Canon Inc 文書処理装置
JP2002526862A (ja) * 1998-10-01 2002-08-20 ビーシーエル コンピューターズ, インコーポレイテッド ドキュメントを表わすデータの操作および表示のための他のフォーマットへの変換
JP2000293521A (ja) * 1999-04-09 2000-10-20 Canon Inc 画像処理方法、装置及び記憶媒体

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210303790A1 (en) * 2020-03-27 2021-09-30 Fujifilm Business Innovation Corp. Information processing apparatus

Also Published As

Publication number Publication date
AU2003289192A1 (en) 2004-06-30

Similar Documents

Publication Publication Date Title
RU2357284C2 (ru) Способ обработки цифровых рукописных примечаний для распознавания, привязки и переформатирования цифровых рукописных примечаний и система для его осуществления
US6336124B1 (en) Conversion data representing a document to other formats for manipulation and display
US6952803B1 (en) Method and system for transcribing and editing using a structured freeform editor
US8225200B2 (en) Extracting a character string from a document and partitioning the character string into words by inserting space characters where appropriate
US20040202352A1 (en) Enhanced readability with flowed bitmaps
JP2002082937A (ja) インクの分類、係留、および変換
US20110173532A1 (en) Generating a layout of text line images in a reflow area
JPH09305351A (ja) 文字列抽出システム及びプログラム記憶媒体
JP4780169B2 (ja) データ生成装置、スキャナ、及びコンピュータプログラム
US20240104290A1 (en) Device dependent rendering of pdf content including multiple articles and a table of contents
US12248747B2 (en) Device dependent rendering of PDF content
JP2004086621A (ja) 電子機器、表示制御方法、プログラム、及び、記録媒体
WO2004053724A1 (fr) Dispositif de conversion de donnees, procede de conversion de donnees, et support d'enregistrement comprenant un programme de conversion de donnees
JPH10124494A (ja) 情報処理機器及び注釈付加方法
KR20050061620A (ko) 문서편집 소프트웨어의 수식 편집기 및 그 편집방법
JP3943582B2 (ja) 対訳文対応付け装置
KR101159323B1 (ko) 아시아 언어들을 위한 수기 입력
JP2006107155A (ja) 文書構造化処理装置、文書構造化処理方法及びこの方法をコンピュータに実行させるためのプログラム
KR100842107B1 (ko) 휴대용 정보단말기의 문서 디스플레이 장치 및 방법
JP3176588B2 (ja) 手書き文字入力変換装置及び文書作成装置とコンピュータ読み取り可能な記録媒体
JP6765113B2 (ja) 文字列処理装置、文字列処理方法、文字列処理プログラム及びコンピュータ読み取り可能な記録媒体
JPH0778800B2 (ja) 文書処理装置
JP3139955B2 (ja) 情報処理方法とその装置
JP2845235B2 (ja) 文書データ処理装置
JP2845234B2 (ja) 文書データ処理装置

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP

122 Ep: pct application non-entry in european phase