HK1057111A1 - Systems and methods for digital document processing - Google Patents
Systems and methods for digital document processing Download PDFInfo
- Publication number
- HK1057111A1 HK1057111A1 HK03108841A HK03108841A HK1057111A1 HK 1057111 A1 HK1057111 A1 HK 1057111A1 HK 03108841 A HK03108841 A HK 03108841A HK 03108841 A HK03108841 A HK 03108841A HK 1057111 A1 HK1057111 A1 HK 1057111A1
- Authority
- HK
- Hong Kong
- Prior art keywords
- file
- content
- data
- objects
- source data
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/106—Display of layout of documents; Previewing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Processing Or Creating Images (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Document Processing Apparatus (AREA)
Abstract
Display technologies that separate the underlying functionality of an application program from the graghical display process, thereby eliminating or reducing the application's need to control the device display and to provide graphical user interface tools and controls for the display. Additionally, such systems reduce or eliminate the need for an application program to be present on a processing system when displaying data created by or for that application program, such as a document or video stream. Thus it will be understood that in one aspect, the systems and methods described herein can display content, including documents, video streams, or other content, and will provide the graphical user functions for viewing the displayed document, such as zoom, pan, or other such functions, without need for the underlying application to be present on the system that is displaying the content. The advantages over the prior art of the systems and methods described herein include the advantage of allowing different types of content from different application programs to be shown on the same display within the same work space.
Description
RELATED APPLICATIONS
This application claims priority from uk patent application No. 0009129.8 previously filed on 14/4/2000 and us patent application No. 09/703502 filed on 31/10/2000, both by Majid Anwar, the contents of both of which are incorporated herein by reference.
Technical Field
The present invention relates to data processing systems, and more particularly to a method and system for processing digital files to produce an output representation of the source file, as a visual display, hard backup, or in other display formats.
Background
The term "digital file" as used herein is used to describe a digital representation of any type of data processed by a data processing system that is ultimately output, in whole or in part, to a human user in some form, typically by being displayed or visually reproduced (e.g., by a visual display unit or printer), or by text-to-speech conversion, etc. The digital file may include any representable feature, including but not limited to: a text; a graphical image; an animated graphic image; full motion video images; interactive icons, buttons, menus, or hyperlinks. The digital file may also include non-visual elements such as audio (sound) elements.
Data processing systems, such as personal computer systems, typically require the processing of "digital files," which may originate from any one of a number of local or remote sources and which may exist in any one of a number of data formats ("file formats"). In order to generate an output version of the document, whether as a visual display or printed copy, for example, the computer system is required to interpret the original data document and generate an output compatible with the associated output device (e.g., monitor or other visual display device or printer). Typically, this process involves an application program for interpreting the data files, the operating system of the computer, a software "driver" specific to the desired output device, and in some cases (particularly for a monitor or other visual display device) as ancillary hardware in the form of an expansion card.
Conventional methods for processing digital files to produce output are inefficient in terms of hardware resources, software overhead, and processing time, and are entirely unsuitable for low-power, portable data processing systems, including wireless telecommunications systems, and for low-cost data processing systems, such as network terminals and the like. Other problems are encountered in conventional digital document processing systems, including the need to configure multiple system components (including hardware and software components) to interact in a desired manner, and inconsistencies in the processing of the same original material through different systems (e.g., differing in formatting, color reproduction, etc.). Furthermore, conventional methods for digital file processing cannot take advantage of the versatility and/or reusability of file format components.
Disclosure of Invention
It is an object of the present invention to provide a digital document processing method and system, and apparatus incorporating the same, which obviates or mitigates the disadvantages of conventional methods and systems described above.
The systems and methods described herein provide a display technique that separates the underlying functionality of an application from the graphical display process, thereby eliminating or reducing the need for applications to control device displays and provide graphical user interface tools and controls for displays. In addition, such a system reduces or eliminates the need for an application program when displaying data (e.g., files or video streams) created by or for the application program on a processing system. It will thus be appreciated that in one aspect, the systems and methods described herein may display content, including files, video streams, or other content, and will provide graphical user functionality, such as zoom, pan, or other such functionality, for viewing the displayed files without the need for an underlying application to be present on the system displaying the content. The benefits of the systems and methods described herein over the prior art include allowing different types of content from different applications to be displayed on the same display within the same workspace. Many benefits will be apparent to those of ordinary skill in the art and many ways of using the underlying techniques of the present invention can be appreciated by those of ordinary skill in the art to create additional systems, devices, and applications. Such modified and alternative systems and implementations should be understood to be within the scope of the present invention.
More particularly, the systems and methods described herein include a digital content processing system that includes an application scheduler module (dispatcher) for receiving an input byte stream representing source data in one of a number of predetermined data formats and for associating the input byte stream with one of the predetermined data formats. The system may also include a file agent module (document agent) for interpreting the input byte stream as a function of the associated predetermined data format and for parsing the input byte stream into a file object stream that provides an internal representation of the underlying structure within the input byte stream. The system also includes a core file engine for converting the file objects into an internal representation data format and for mapping the internal representation to a location on the display. A shape processor within the system processes the internal representation data to drive an output device to render the content in a manner expressed by the internal representation.
According to the present invention, there is provided a method for representing digital content, the method comprising the steps of: (a) receiving source data representing the digital content; (b) processing the source data to translate the source data into an internal representation of the digital content, the internal representation having a predetermined format; and the source data processing comprises: (c) identifying objects that appear within the source data; (d) creating a file object for each object identified within the source data, the file object representing an internal representation of the encountered object, and the file object (120) separating the structure of the object from the data content of the object; grouping the file objects into a file structure, the file structure representing a structure of the digital content; (g) grouping data content of the objects into a data content structure; and (e) providing a set of pointers that associate the file objects in the file structure with the data content stored in the data content structure; the method further comprises the following steps: receiving source data representing the digital content in any one of a plurality of predetermined formats; and further comprising: prior to performing step (a), providing a plurality of file proxy modules, each of said file proxy modules for translating source data in at least one of said plurality of formats into said internal representation format; after the following step and before the step, identifying a particular file proxy module from the plurality of file proxy modules, the particular file proxy module for translating current source data into a format of the internal representation; and performing the processing of step (b) using the particular file proxy module, whereby the format of the internal representation is independent of the format of the source data.
According to the present invention there is provided a system for representing digital content, the system comprising: an input mechanism for receiving source data representing the digital content); means for processing said source data to translate said source data into an internal representation of said digital content, said internal representation having a predetermined format; and the source data processing comprises: identifying objects that appear within the source data; creating a file object for each object within the source data, the file object representing an internal representation of the encountered object, and the file object separating the structure of the object from the data content of the object; grouping the file objects into a file structure, the file structure representing a structure of the digital content; grouping data content of the objects into a data content structure; and providing a set of pointers that associate said file objects in said file structure with said data content stored in said data content structure; the system further comprises: the input mechanism is for receiving source data representing the digital content in any one of a plurality of predetermined formats; and said means for processing said source data comprises a plurality of file proxy modules, each of said file proxy modules for translating source data in at least one of said plurality of formats into said internal representation format; the input mechanism identifying a particular document agent module from the plurality of document agent modules, the particular document agent module for translating the current source data into the internal representation format; and the specific file proxy module performs processing of the source data, whereby the format of the internal representation is independent of the format of the source data.
Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings.
Brief Description of Drawings
The above and other objects and advantages of the present invention will become more apparent from the following further description with reference to the accompanying drawings, in which:
FIG. 1 is a block diagram illustrating one embodiment of a digital document processing system in accordance with the present invention.
FIG. 2 is a block diagram that presents the system depicted in FIG. 1 in more detail;
FIG. 3 is a flow diagram of a file proxy module;
FIG. 4 schematically depicts an exemplary file of the type that may be processed by the system of FIG. 1;
FIG. 5 depicts a flowchart of two exemplary processes used to reduce redundancy in an internal representation of a file; while
FIGS. 6-8 depict exemplary data structures for storing an internal representation of a processed source file.
Detailed Description
The systems and methods described herein include computer programs that operate to process an output stream or output file that is generated by an application program for providing the output on an output device (e.g., a video display). The application according to the invention may process the streams to create an internal representation of the output and may further process the internal representation to produce a new output stream that may be displayed on an output device as the output produced by the application of the invention. Thus, the system of the present invention separates the application from the display process, thereby freeing the application from having to display its output on a particular display device, and further eliminating the need to provide the application when processing the output in order to display the output of the application.
To illustrate this operation, FIG. 1 provides a high-level functional block diagram of a system 10 that allows several applications (collectively shown as elements 13) to pass their output streams to a computer process 8, which computer process 8 processes those output streams and produces representations of the aggregate outputs created by those streams for display on a device 26. The collective output of the applications 13 is illustrated in FIG. 1 by an output printer device 26, the output printer device 26 rendering output content produced by the different applications 13. As will be appreciated by those of ordinary skill in the art, the output device 26 presents outputs generated by the computer process 8 that collectively convey the content of the plurality of applications 13. In the illustration provided by fig. 1, the content presented comprises several images and the output device 26 is a display. However, it will be apparent to those of ordinary skill in the art that in other implementations, the content may be delivered in formats other than images, such as auditory, tactile, or any other format or combination of formats suitable for conveying information to a user. Further, those of ordinary skill in the art will appreciate that the type of output device 26 will vary depending on the application, and may include devices for rendering audio content, video content, print content, render content, or any other type of content. For purposes of illustration, the systems and methods described herein will be primarily illustrated as displaying graphical content via a display device, although it will be understood that these exemplary systems are for purposes of illustration only and should not be construed as limiting illustrations in any way. Thus, the output produced by the application 13 is processed and aggregated by the computer process 8 to create a single display that includes all of the content produced by the individual applications 13.
In the illustrated embodiment, each representation output appearing on display 26 is a file, and each such file may be associated with one of applications 13. It should be understood that the term file as used herein is intended to encompass files, streaming video, streaming audio, web pages, and any other form of data that may be processed and displayed by the computer process 8. The computer process 8 generates a single output display comprising one or more files displayed therein generated by the application 13. The collection of displayed files represents content generated by the application 13 and this content is displayed within a program window generated by the computer process 8. The program window for the computer process 8 may also include a set of icons representing tools configured with a graphical user interface and enabling a user to control, in the case of display, the operation of files appearing in the program window.
In contrast, conventional approaches have each application program form its own display, which would result in a representation on the display device 26 that includes several program windows, typically one for each application program 13. In addition, each different type of program window will include a different set of tools for manipulating the content displayed in the window. The system 10 of the present invention therefore has the advantage of: a consistent user interface is provided and requires only knowledge of a set of tools for displaying and controlling different files. In addition, the computer process 8 operates on the output of the application program 13 so that the output is only required to create files that appear within the program window. Thus, it is not necessary for the application 13 to reside on the same machine as the process 8, nor is it necessary for the application 13 to operate in conjunction with the computer process 8. The computer process 8 only requires the output of the application programs 13, which may be derived from stored data files created by the application programs 13 at an earlier time. However, the systems and methods described herein may be used as part of a system in which an application is capable of rendering its own content, controlling at least a portion of the display 26, and rendering the content within a program window associated with the application. In these embodiments, the system and method of the present invention may operate as separate applications that appear on the display and within a portion of the display provided for its use.
More specifically, FIG. 1 depicts several application programs 13. These applications may include Word processing programs such as Word, WordPerfect, or any other similar Word processing program. It may further include the program of: NetscapeComposer to generate HTML files, Adobe Acrobat to process PDF files, web servers to transmit XML or HTML, streaming servers to generate audio-video data streams, email clients or servers, databases, spreadsheet software, or any other kind of application to transmit output as a file, data stream, or in some other format suitable for use by a computer process. In the embodiment of fig. 1, each application 13 submits its output to the computer process 8. In operation, this process may be implemented by having the application process 13 direct its output stream as an input byte stream to the computer process 8. The use of data streams is well known to those of ordinary skill in the art and has been described in the literature, including, for example, "Programming in C" by Stephen G.Kochan, Hayden Publishing (1983). Alternatively, the application 13 may create a data file, such as a Word file, that may be streamed into the computer process 8 by a separate application or by the computer process 8.
The computer process 8 is capable of processing the various input streams to create an aggregate display for display on the display device 26. To this end, and as will be described in greater detail below, the computer process 8 processes the input streams to generate an internal representation of each input stream. In practice, the internal representation is specified to appear as close as possible to the output stream of the corresponding application 13. However, in other embodiments, the internal representation may be created to have a selected, simplified or partial similarity to the output stream produced by the corresponding application 13. Additionally and optionally, the systems and methods described herein may also employ filters on the interpreted content, allowing portions of the content to be removed from the displayed content or otherwise rendered. Further, the systems and methods described herein may allow alteration of the structure of a source file, allow restoration of content within a file, rearrange the structure of the file, or simply select certain types of data. Also, in an alternative embodiment, content may be added during the interpretation process, including active content linked to the network site. In either case, the internal representation created by the computer process 8 may be further processed by the computer process 8 to drive the display device 26 to create the collective image represented in FIG. 1.
Turning now to fig. 2, fig. 2 presents a more detailed representation of the system of fig. 1. In particular, FIG. 2 depicts a system 10 that includes the computer process 8, the source file 11, and a display device 26. The computer process 8 includes a number of file agent modules 12, an internal representation format file and process 14, buffer memory 15, generic object library 16, core file engine (which in this embodiment includes parsing module 18 and rendering module 19), an inside view 20, a shape processor 22 and a terminal output 24. FIG. 2 further depicts an optional input device 30 for communicating user input 40 to the computer process 8. The embodiment includes a process 8 that includes a shape processor 22. However, it will be apparent to those of ordinary skill in the art that the process 8 is merely exemplary and that the process 8 may be implemented by alternative processes and architectures. For example, the shape processor 22 may alternatively be implemented as a hardware component, such as a semiconductor device, that supports the operation of the other components of the process 8. Furthermore, it should be clear that although FIG. 2 presents process 8 as a functional block diagram comprising a single system, process 8 may be distributed across many different platforms, and it is possible that the components may be run at different times, with the output from one component of process 8 then being passed as input to the next component of process 8.
As described above, each source file 11 is associated with a file proxy module 12 that is capable of translating an incoming file into an internal representation of the content of the source file 11. To determine the appropriate file agent 12 to process a source file 11, the system 10 of FIG. 1 includes an application scheduler module (not shown) that controls the interface between applications and the system 10. In a practical application, the use of an external Application Programming Interface (API) is solved by the application scheduler module, which passes data, calls the appropriate file broker module 12, or otherwise implements the request made by the application. To select the appropriate file agent 12 for a particular source file 11, the application dispatcher notifies all loaded file agents 12 of the source file 11. These document agents 12 then respond with information relating to their particular suitability in order to translate the content of the published source document 11. Once the file proxy module 12 has responded, the application dispatcher selects a file proxy module 12 and passes a pointer, such as a URI (Universal resource identifier) of the source file, to the selected file proxy module 12.
In one implementation, the computer process 8 may operate as a service through which process activities may be created to support multiprocessing of multiple file sources 11. In other embodiments, the process 8 does not support multi-process activities, but instead calls the file proxy module 12 selected by the application scheduler (applicationspatcher) in the current process activity.
It is clear that the exemplary embodiment of fig. 2 provides a flexible and extensible front end for processing input data streams of different file formats. For example, optionally, if the application dispatcher determines that the system lacks a file agent 12 suitable for translating the source file 11, the application dispatcher may signal the corresponding application 13 that the source file is not recognized in its format. Optionally, the application 13 may choose to allow reformatting of the source document 11, for example by converting the source document 11 produced by the application 13 from its present format to another format supported by the application 13. For example, an application 13 may determine that the source file 11 needs to be stored in a different format (e.g., an earlier version of the file format). As a result, the application 13 supports the formats, and the application 13 can restore the source file 11 in the supported formats so that a document agent 12 provided by the system 10 can translate the source file 11. Optionally, the application dispatcher, upon detecting the absence of an appropriate document agent 12 from the system 10, may indicate to the user that a new document agent of a particular type may be required for translating the current source document 11. To this end, the computer process 8 may indicate to the user that a new file agent module needs to be loaded into the system 10, and may direct the user to a location, such as a web site, from which the new file agent module 12 may be downloaded. Alternatively, the system may automatically retrieve the agent without querying the user, or may identify a generic agent 12, such as a generic text agent that may extract portions of the source file representation text. Further, an agent may be provided to prompt the user for input and instructions during the translation process.
In a further alternative embodiment, an application scheduler module and the file broker module 12 work in combination as an input module for identifying the file format of the source file 11 according to any of various criteria, such as an explicit file type identifier within the file, including a file name extension according to file name, or according to known content characteristics of a particular file type. The byte stream is input to the document agent 12 and is specified as the document format of the source document 11.
While the above description has discussed the input data being provided by a stream or computer file, it should be understood by those of ordinary skill in the art that the system 10 may also be applied to input received from an input device, such as a digital camera or scanner, as well as to input received from applications that may directly output it to the process 8 or output it to the process 8 via an operating system. In this case, the input byte stream may originate directly from the input device rather than from a source file 11. However, the input byte stream will still be in a data format suitable for processing by the system 10, and for the purposes of the present invention, input received from such an input device may be considered a source file 11.
As shown in FIG. 2, the file broker module 12 employs a standard object library 16 to generate the internal representation 14 to describe the contents of the source file in terms of a collection of file objects of a generic type defined in the library 16, along with parameters defining the nature of specific instances of various file objects within the file. Thus, the library 16 provides a set of objects of several types that the file broker module 12, the parser 18, and the system 10 already know about. For example, the file objects employed in the internal representation 14 may include: text, bitmap graphics, and vector graphics file objects that may or may not be active and may be two-dimensional or three-dimensional: video, audio, and various types of interactive objects (e.g., buttons and icons). The vector graphics file object may be a PostScript (Page description language) -like path with specified fill and transparency. The bitmap graphic file object may include a set of child object types such as JPEG, GIF, and PNG object types. The text file object may represent an area of special text. The region may comprise a paragraph of text, generally understood as a group of characters, appearing between two separators (like a pair of carriage returns). Each text object may include a series of characters and style information for the character string, including one or more associated fonts, punctuation, and other such style information.
The parameters defining a particular instance of a file object typically include a spatial coordinate system defining the physical shape, size and location of the file object and any corresponding transient data defining the file object whose properties vary with time, thereby allowing the system to handle dynamic file structure and/or display functions. For example, a video input stream may be processed by the system 10 as a series of graphics that vary at a rate of, for example, 30 frames per second. In this case, the transient characteristics of such a graphical object indicate that the graphical object is updated 30 times per second. As described above, for text objects, the parameters also generally include the font and size applied to the string. The object parameters may also define other properties, such as transparency. It will be appreciated that the internal representation may be stored/deposited to the system in a native file format, and that the range of possible source files input to the system 10 may include files in a native file format in the system. It is also possible to convert the internal representation 14 into any of several other file formats, if desired, using a suitable conversion agent.
FIG. 3 depicts a flowchart of an exemplary process that may be performed by a file broker module 12. In particular, FIG. 3 depicts a process 50 that represents the operation of an exemplary document agent module 12, in which case the document agent module 12 is adapted to translate the contents of a Microsoft Word document into an internal representation format. Specifically, the process 50 includes an initialization step 52 in which the process 50 initializes data structures, memory space, and other resources to be used by the process 50 during translation of the source file 11. After step 52, the process 50 proceeds through a series of steps 54, 58 and 60 in which the source file 11 is analyzed and divided into subsections. In the process 50 of fig. 3, steps 54, 58 and 60 first subdivide the source file 11 into sections, then sections, and then sections into individual characters that make up the sections as it flows into the file broker module 12. The sections, paragraphs and characters identified within the source file 11 may be identified within a block table (piecetable) that contains pointers to different subsections within the source file 11. It will be appreciated by those of ordinary skill in the art that the chunking table depicted in FIG. 3 represents a construct employed by MSword for providing pointers to different, identified subsections of a file. It will further be appreciated that the use of a block table or similarly structured block table is optional and depends on the application used, including on the type of file being processed.
When the process 50 begins to identify different characters appearing within a particular paragraph at step 60, the process 60 may proceed to step 62 where a style is applied to the character or group of characters identified at step 60. The application of a pattern is understood to associate the recognized characters with a presentation pattern that is used with those characters. The presentation style may include properties associated with the character, including font type, font size, whether the character is bolded, skewed, or otherwise. Additionally, in step 62, the process may determine whether the character is rotated, or placed along a curved trajectory or other shape. Additionally, in step 62, styles associated with the paragraphs in which the characters are located may also be identified and associated with the characters. Such properties may include line spacing associated with a paragraph, margin associated with the paragraph, spacing between characters, and other such properties.
After step 62, the process 50 proceeds to step 70 where the internal representation is established. The object illustrating the file structure is created in step 64 as an object within the internal representation, the associated style of the object being created independently in step 68 within the internal representation along with the strings it contains. Fig. 6, 7, and 8, which will be explained in greater detail later, depict the file structures created by the process 50, where the structure of a file is recorded by a set of file objects, and the data associated with the file objects is stored in a separate data structure. After step 70, process 50 proceeds to decision block 72 where process 50 determines whether the paragraph associated with the last processed character is complete. If the paragraph is not complete, the process 50 returns to step 60 to read the next character of the paragraph. Alternatively, if the paragraph is complete, process 50 proceeds to decision block 74, where process 50 determines whether the section is complete. If the section is complete, the process 50 returns to step 58 to read the next paragraph of the table. Alternatively, if the section is complete, the process 50 proceeds to step 54, where the next section is read from the block table if present, and processing continues. Once the document has been processed, the system 8 may transmit, store, export, or otherwise store the interpreted document for later use. The system may store the interpreted file in a format compatible with the internal representation and may optionally be stored in other formats and formats including a format compatible with the file format of the source file 11 (which may be in an "output file proxy module", not shown, capable of receiving internal representation data and creating source file data), or in binary format, text file description structure, tagged text (marked-up text), or in any other suitable format; and a general text coding model including unicode (uniform code), shiftmapping, big-5 (traditional code), and luma/chroma models can be employed.
As can be seen from the above, the format of the internal representation 14 separates the "structure" (or "layout") of the file as specified by the object types and their parameters from the "content" of the various objects; for example, a text object's character string (content) is separated from the object's spatial parameters; the image data (content) of a graphical object is separated from its spatial parameters. This allows the file structure to be defined in a compact manner and provides options for content data to be stored remotely and retrieved by the system only when required. The internal representation 14 illustrates the file and its constituent objects in a "high-level" description.
The document agent module 12, as described above with reference to FIG. 3, is capable of processing a data file created by the MSword word processing application and translating the data file into an internal representation formed by a set of object types selected from the library 16 representing the contents of the processed file. Thus, the document agent module 12 parses the Word document and translates the structure and content of the document into an internal representation known to the computer process 8. An example of a Word document that may be processed by the document agent module 12 is depicted in FIG. 4. Specifically, FIG. 4 depicts a Word file 32 created by the MSword application. The file 32 comprises a page of information, wherein a page comprises two columns of text 34 and a chart 36. FIG. 4 further illustrates that the text columns 34 and charts 36 are located on the page 38 in such a way that: one column of text is lined from the top of page 38 to the bottom of page 38, a second column of text is lined from near the middle of the page to the bottom of the page, and chart 36 is placed over the second column of text 34.
As described above with reference to FIG. 3, the document agent module 12 begins processing the document 32 by determining that the document 32 includes a page and contains several different objects. For a page found by the document agent 12, the document agent 12 identifies the style of the page, which may be, for example, the page style of an 8.5 x 11 page in portrait format. The page styles identified by the document agent module 12 are embedded into the internal representation for later use by the parser 18 in formatting and streaming text into the document created by the process 8.
Only one page is given for the file 32 depicted in fig. 4. It will be clear, however, that the document agent module 12 may process a Word document that includes several pages. In this case, the document agent module 12 will process each page separately by creating a page and then populating it with objects of the type found in the library. The page style information may thus include: a file contains several pages and the pages have a certain size. Other page style information may be identified by the document agent module 12 and the identified page style information may vary depending on the application. Thus, different page style information may be identified by a file agent module capable of processing Microsoft Excel files or a real-time media data stream.
As further described with reference to FIG. 4, once the document agent module 12 has identified the page style, the document agent module 12 may begin to break down the document 32 into several objects that may be mapped to document objects known to the system and generally stored in the repository 16. For example, the file proxy module 12 may process the file 32 to discover text objects, bitmap objects, and vector graphics objects. Other types of object types may be selected for provision, including video types, animation types, button types, and script types. In this implementation, the document agent module 12 will recognize a text object 34 having two columns of associated styles. The text paragraphs that appear within the text object 34 may be analyzed to identify each character in each respective paragraph. Process 50 may apply a style property to each recognized string, and each string recognized within file 32 may be mapped to a text object of the type listed within the library 16. Each string and applied style may be understood as an object recognized by the file proxy module 12 as having been found in the file 32 and as having been translated into a file object, in this case a text object of the type listed in the library 16. The internal representation object may flow from the file proxy module 12 into the internal representation 14. The file proxy module 12 may continue to translate objects that appear within the file 32 into file objects known to the system 10 until each object has been translated. These object types may be appropriate for the application and may include object types suitable for translating source data representing digital files, audio/video representations, music files, interaction scripts, user interface files and image files, and any other file types.
Turning now to FIG. 5, it can be seen that the process 80 depicted in FIG. 5 allows similar objects appearing in an internal representation of a source file 11 to be compressed in order to reduce the size of the internal representation. For example. FIG. 5 depicts a process 80 in which step 82 has a base library object A that is processed in step 84 by inserting the base object into a file that will be the internal representation of the source file 11. In step 88, another object B provided by the file proxy module 12 is passed to the internal representation file process 14. The process 80 then proceeds to a sequence of steps 92-98 in which the characteristics of object a are compared to the characteristics of object B to determine whether the two objects have the same characteristics. For example, if object A and object B represent two characters, such as the letter P and the letter N, and if the characters P and N are the same color, the same font, the same size, and the same style (e.g., bold or italic), then the process 80 joins the two objects together in step 94 and in one of the object classifications stored in the internal representation. If the features do not match, then process 80 adds them as two separate objects to the internal representation.
FIG. 5 depicts a process 80 in which the internal representation file 14 compresses the objects as a function of the similarity of physically adjacent objects. It will be appreciated by those of ordinary skill in the art that this is merely one process for compressing objects and that other methods may be employed. For example, in an alternative implementation, the compression process may include a process for compressing visually adjacent objects.
Figures 6, 7 and 8 depict the structure of an internal representation of a document that has been processed by the system depicted in figures 1 and 2. The internal representation of the file may be embedded as a computer file or data stored in the core memory. However, it will be apparent to those of ordinary skill in the art that the data structure selected for recording or transmitting the internal representation may vary depending on the application, and any suitable data structure may be used in conjunction with the systems and methods described herein without departing from the scope of the present invention.
The structure of the internal representation of the processed file separates the structure of the file from the content of the file as will be described in more detail below. In particular, the structure of a file is recorded by a data structure that indicates the different file objects that make up the file and the way the file objects are arranged with respect to each other. The separation of structure from content is illustrated in fig. 6, where a data structure 110 records the structure of the file being processed and stores the structure in a data format that is independent of the content associated with the file. In particular, the data structure 110 includes a resource table 112 and a file structure 114. The resource table 112 provides a list of resources used to construct the internal representation of the file. For example, the resource table 112 may include one or more tables of common structures, such as font, link, and color lists, that appear inside the document. These common structures may be referenced by number within the resource table 112. The resources of the resource table 112 relate to file objects that are positioned within the file structure 114. As shown in FIG. 6, the file structure 114 includes a number of storage packages (containers) 118 represented by nested bracketing groups. Within storage package 118 are several file objects 120. As shown in FIG. 6, the storage package 118 represents a collection of file objects that appear within a file being processed. FIG. 6 further illustrates that the storage package 118 can also own child storage packages. For example, the file structure 114 includes a top-level storage package identified by an outer set of brackets labeled 1, and also has three nested storage packages 2, 3, and 4. In addition, the storage package 4 is nested twice within the storage packages 1 and 3.
Each storage package 118 represents features within a file, where the features may be a collection of individual file objects (e.g., the file objects 120). Thus, for example, a file such as file 32 depicted in FIG. 4 may include a stored package representing the string of characters comprising text appearing in column 34. For example, different file objects 120 that appear within the string storage package may be represented by different paragraphs that appear within the string. The string storage package has a pattern associated therewith. For example, the character string depicted in fig. 4 may include style information indicating the font type, font size, style (e.g., bold or italic style), and style information indicating the size of the column in which the character string or a portion thereof exists, including the width and length. This style information may be used later by parsing module 18 to reformat and reflow the text within context-specific form 20. Another example of a storage package may be a table, which may appear, for example, in a column 34 of the file 32. The table may be a storage package with objects. Other types and uses of storage packages will vary depending on the application involved, and the system and method of the present invention is not limited to any particular set of object types or storage packages.
Thus, when the file proxy module 12 translates a source file 11, objects belonging to known object types will be encountered, and the file proxy module 12 will request the library 16 to create an object of the appropriate object type. The file proxy module 12 then places the created file objects in the appropriate locations within the file structure 114 to maintain the overall structure of the source file 11. For example, when the file proxy module 12 encounters an image 36 in the source file 11, the file proxy module 12 recognizes the image 36 (which may be a JPEG image, for example) as a bitmap type object and the optional subtype is JPEG. As shown in steps 64 and 68 of FIG. 3, this file broker module 12 may create an appropriate file object 120 and may place the created file object 120 into the structure 114. Additionally, data for a JPEG image file object 120, or in another example, data for a character and an associated pattern for a string of characters, may be stored in the data structure 150 depicted in FIG. 8.
While the source file 11 is being processed, the file broker module 12 may identify other storage packages that may represent sub-features that occur within an existing storage package (e.g., a string). For example, these sub-features may include links to reference material, or links to visual areas or features of clips that appear within the file and contain the collection of individual file objects 120. The file proxy module may place the file objects 120 in a separate storage package that will be nested into an already existing storage package. The layout of these file objects 120 and storage packages 118 is illustrated in FIG. 7A as a tree structure 130, where the individual storage packages 1, 2, 3, and 4 are shown as storage package objects 132, 134, 138, and 140, respectively. The storage package 118 and file objects 120 are arranged in a tree structure that shows a nested storage package structure of file structures and the different file objects 120 that exist within the storage package 118. The tree structure of FIG. 7A also illustrates that the structure 114 records and maintains the structure of the source file 11, with the source file being displayed in a hierarchical structure of file objects 120, where the file objects 120 include style information, such as the size of the column in which a string appears, or transient information, such as the frame rate for streaming content. Thus, the graph structure of each file is illustrated by a series of parameterized components. An example is given in table 1 below.
TABLE 1
Parameter(s) | Examples of such applications are |
Type (B) | Bitmap |
Boundary frame | 400, 200; 600, 700 units (left lower, right upper) |
Filling in | Object 17 |
Alpha | 0 (none) |
Shape of | Object 24 |
Time of day | 0, -1 (infinity) (start, end) |
It can be seen that table 1 gives an example of the parameters that can be used to describe the graph structure of a file. Table 1 gives an example of a parameter such as an object type, in this case a bitmap object type. The bounding box parameter is provided and gives the location of the file object in the source file 11. Table 1 further provides the fill used and an alpha factor representing the transparency of the object. The shape parameter provides a handle (handle) to the shape of the object, which in this case may be a path defining the outline of the object, including irregularly shaped objects. Table 1 also gives a time parameter that represents the transient change of the object. In this example, the image is stable and does not change over time. However, if the image object is to be rendered as streaming media, the parameter may comprise a transient characteristic indicative of the rate at which the object changes, such as the rate of the content which is comparable to the desired frame rate.
Thus, the structuring element is a storage package having flowable data content that is maintained separately and referenced by a handle to the storage package. In this way, any or all of the data content may be maintained remotely from the file structure. This allows the file to be presented in a mixed manner of data content that can be locally maintained and remotely maintained. In addition, the data structure allows for a rapid progressive rendering of the internal representation of the source file, broader and higher level objects may be rendered first, while subtle features may be rendered in a subsequent order. Thus, the separate structure and data allows the visual file to be presented as the streaming data "fills" the content. In addition, the separation of content and structure allows the content of the file to be easily edited or changed. Since the file structure is independent of the content, different content can be substituted into the file structure. This can be done on a storage package through the storage package principle or on the whole file. The structure of the file may be transferred independent of the content and the content provided later or the structure may be transferred to a platform for rendering.
In addition, FIG. 7A shows that the structure of the source file 11 can be represented as a tree structure 130. In a practical application, the tree structure can be modified and edited to change the representation of the source file 11. For example, the tree structure may be modified to add additional structure and content to the tree 130. This is depicted in FIG. 7B, which shows the original tree structure of FIG. 7A copied and rendered under a high-level storage package. Thus, FIG. 7B illustrates that a new representation can be created by processing the tree structure 130 generated by the file broker module 12 to create a new file structure. This allows the visual position of objects within a file to be changed while the relative position of different objects 120 may remain the same. By adjusting the tree structure 130, the system described herein can edit and modify the content. For example, in those applications where the content in the tree structure 130 represents visual content, the system described herein may edit the tree structure to duplicate the image of the file and present the images of the file side-by-side. Alternatively, the tree structure 130 may be edited and supplemented to add additional visual information, such as by adding an image of a new file or portion of a file. In addition, by controlling the rate at which the tree structure changes, the system described herein can create the illusion that a file gradually changes, such as sliding across a display, such as display device 26, or gradually changing to a new file. Other effects, such as the creation of thumbnails and other similar results, can also be achieved, and modifications to the systems and methods described herein by those of ordinary skill in the art will be within the scope of the present invention.
The data of the source file 11 is stored independently of the structure 114. To this end, each file object 120 includes a pointer to the data associated with the object, and this information may be arranged in an indirect addressing list, such as the one depicted in FIG. 8. As shown in FIG. 8, in this implementation, each file object 120 is numbered and an indirection list 152 is created, where each file object number 154 is associated with an offset 158. For example, the file object number 1 identified by reference numeral 160 may be associated with the offset 700 identified by reference numeral 162. Thus, the indirect addressing list associates object number 1 with offset 700. The offset 700 may represent a location in core memory, or a file offset where data associated with object 1 may reside. As further shown in fig. 8, there may be a data structure 150 in which data representing content associated with a corresponding file object 120 may be stored. Thus, for example, the object 1 at jump location 700 may include Unicode characters representing the characters present in the string of storage packet 1 depicted in FIG. 6. Similarly, object 2 data, depicted in FIG. 8 by reference numeral 172, associated with core memory location 810 identified by reference numeral 170, may represent a JPEG bitmap associated with bitmap file object 120 referenced within the file structure of FIG. 6.
It should be noted by those of ordinary skill in the art that because data is independent of structure, the contents of a source file are stored in a centralized repository (centralized repository). Thus, the system described herein allows for compression of different types of data objects. Such processing provides greater storage flexibility in a limited resource system.
Returning to FIG. 2, it is clear that once the process for compressing the contents of an internal representation file has completed compressing the various objects, the objects are passed to parsing module 18. The parsing module 18 parses objects identified in the structural part of the internal representation and applies the position and style information to each object again with reference to the data content associated with this object. The presentation component 19 generates a context-specific representation, i.e. a "view" 20, of the file represented by the internal representation 14. The required view may be all of these files, a complete file, or a portion of one or some of the files. The rendering component 19 receives a view control input 40, which view control input 40 defines the viewed context of the particular file view to be generated, as well as any relevant transient parameters. For example, the system 10 may be required to generate an image zoom view of a portion of a document, and then change to pan or scroll the image zoom view to display an adjacent portion of the document. The view control input 40 is interpreted by the rendering component 19 to determine which portion of the internal representation is required for a particular view, and how, when, and for how long the view is to be displayed.
The context-specific representation/view 20 is expressed in terms of basic shapes and parameters.
The rendering component 19 may also perform additional pre-processing functions on the relevant portions of the internal representation when generating the desired view 20 of the source document 11. The visual representation 20 is input to a shape processor 22 for processing to produce an output in a suitable format to drive an output device 26, such as a display device or printer.
The pre-processing functions of the presentation component 19 may include color correction, resolution adjustment/enhancement, and anti-aliasing. The resolution enhancement may include an image zoom function that maintains the sharpness of the content of the object when it is displayed or rendered by the destination output device. Resolution adjustments may be context dependent; for example, when the file view is static, the display resolution of a particular object may be reduced during the time the displayed file view is expanding or scrolling and increasing.
Optionally, there may be a feedback path 42 between the parsing module 18 and the internal representation 14, for example to trigger an update of the content of the internal representation 14, such as where the source file 11 represented by the internal representation comprises a multi-frame animation.
The output of the rendering component 19 represents the file in a base object. For each document object, the representation of the rendering component 19 defines the object at least with a physical, rectangular bounding box, while the actual contour path of the object is bounded by the bounding box, the data content of the object and its transparency.
The shape processor 22 interprets the base object and converts it to an output frame format suitable for the target output device 26; such as a dot map for a printer, a vector instruction set for a plotter, or a bitmap for a display device. An output control input 44 to the shape processor 22 provides information to the shape processor 22 to produce an output suitable for the particular output device 26.
The shape processor 22 preferably processes objects defined by the view representation 20 in terms of "shape" (i.e., outline shape of the object), "fill" (data content of the object), and "alpha" (transparency of the object), performing image scaling and clipping (typically with pixels obtained by scan conversion or the like, for most types of display devices or printers) appropriate for the desired view and output device. The shape processor 22 optionally includes an edge buffer that defines the shape of the object using scan converted pixels and preferably applies anti-aliasing to the silhouette shape. The anti-aliasing process may be performed by applying a gray-scale gradient (gray-scale ramp) to the object boundary in a manner determined by the characteristics of the output device 26. The method enables store-efficient shape clipping and shape intersection processing, and is also store-efficient and processor-efficient. A look-up table or other technique may be employed to define a multi-frequency acoustic response curve, providing non-linear presentation control. The individual primitive objects processed by the shape processor 22 are combined into a composite output frame. One design of a Shape Processor suitable for use with the system described herein is shown in more detail in a patent application entitled Shape Processor, filed on even date herewith, the contents of which are incorporated by reference. However, any suitable shape processor system or process may be employed without departing from the scope of the present invention.
As mentioned above, process 8 depicted in FIG. 1 may be implemented as a software component running on a data processing system such as a handheld computer, mobile phone, set-top box, facsimile machine, copier or other office equipment, embedded computer system, Windows or UNIX workstation, or any other computer/processing platform capable of supporting, in whole or in part, a document processing system as described above. In these embodiments, the system may be implemented in a C language computer program or a computer program written in any high-level language, including C + +, Fortran, Java, or Basic. Further, in embodiments employing a microcontroller or DSP (digital signal processor), the system may be implemented using a computer program written in microcode or a computer program written in a high level language and compiled into microcode that can be executed on the platform employed. The development of such systems is known to those of ordinary skill in the art and such techniques are described inIntel StrongARM processors SA-1110 Microprocessor Advanced Developer’s ManualAre set forth. In addition, useGeneral techniques for advanced Programming are known, for example, as described in Programming in C of Stephen G.Kochan (Hayden Publishing, 1983). It should be noted that digital signal processors are particularly well suited to implement signal processing functions, including pre-processing functions such as image enhancement by adjusting contrast, edge sharpness and brightness. Code development for digital signal processors and microcontroller systems follows principles well known in the art.
Thus, although fig. 1 and 2 graphically depict the computer process 8 as comprising several functional block elements, it will be apparent to a person skilled in the art that these elements may be implemented as a computer program or as part of a computer program capable of running on a data processing platform so as to configure the data processing platform as a system according to the present invention. Furthermore, while FIG. 1 depicts the system 10 as an integrated arrangement of a document processing process 8 and a display device 26, it will be apparent to those of ordinary skill in the art that this is merely an example and that the system described herein may be implemented with other architectures and arrangements, including a system architecture that separates the document processing functions of the process 8 from the document display operations performed by the display device 26. Furthermore, it will be clear that the system of the invention is not limited to those systems that include a display or output device, but that the system of the invention will include a processing system that processes one or more digital files to create an output that can be presented on an output device. However, the output may be saved in a data file for subsequent presentation on a display device, for long term storage, transmission over a network, or other purposes other than direct display. Thus, it will be apparent to those of ordinary skill in the art that the systems and methods described herein may support many different file and content processing applications, and that the architecture of the system or process for a particular application will vary depending on the application and the designer's choice.
From the foregoing, it will be appreciated that the system of the present invention may be "hardwired"; such as in ROM and/or integrated into an Application Specific Integrated Circuit (ASIC) or other single-chip system, or as firmware (programmable read only memory, such as erasable programmable read only memory), or as software, stored locally or remotely and retrieved and executed as required by a particular device. Such improvements and modifications may be incorporated without departing from the scope of the invention.
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the embodiments and implementations described herein. For example, the systems and methods described herein may be stand-alone systems for processing source files 11, but alternatively, these systems may be incorporated in many different ways into various types of data processing systems and devices, as well as into peripheral devices. In a general-purpose data processing system ("host system"), the system of the present invention may be incorporated in parallel with the operating system and applications of the host system, or may be incorporated in whole or in part into the host operating system. For example, the system described herein is capable of quickly displaying various types of data files on a portable data processing device having an LCD display without requiring the use of a browser or application program. Examples of portable data processing devices in which the present system may be employed include "palm-top" computers, portable digital assistants (PDAs, including tablet-type PDAs, where the primary user interface includes graphical displays through which a user interacts directly with a stylus device), mobile telephones and other communication devices that have access to the internet. For portability, such data processing devices require small, low-power processors. Typically, these devices employ advanced RISC-type core processors designed in ASICs (application specific integrated circuits) in order to make the electronics package small and integrated. Such devices also have limited random access memory and are generally devoid of non-volatile data storage (e.g., hard disks). Conventional operating system modes, such as those employed in standard desktop Computing Systems (PCs), require powerful central processing units and mass storage for processing digital files and producing efficient output, are entirely unsuitable for such data processing devices. Specifically, conventional systems do not provide for processing multiple file formats in an integrated manner. In contrast, the system described herein employs a common processing and delivery approach for all file formats, thereby providing a highly integrated file processing system that is extremely efficient in terms of power consumption and utilization of system resources.
The system of the present invention may be integrated at the BIOS level of a portable data processing device, enabling file processing and output with much lower system overhead than conventional system models. Alternatively, these systems may be implemented at a minimum system level just above the transport protocol stack (transport protocol stack). For example, the system may be incorporated into a network device (card) or system to provide embedded processing of network traffic (e.g., packet-level operation in a TCP/IP system).
Where the system may be configured to operate with a predetermined set of data file formats and specific output devices; such as a visual display device of the apparatus and/or at least one printer.
The system described herein may also be incorporated into low cost data processing terminals such as enhanced telephones and "thin" network client terminals (e.g., network terminals with limited local processing and storage resources), as well as "set top boxes" for use with cable television systems that can interact/access the internet. The system may also be incorporated into peripheral devices such as hard copy devices (printers and plotters), display devices (e.g., digital projectors), network devices, input devices (cameras, scanners, etc.), and multifunction peripherals (MFPs). When incorporated into a printer, the system enables the printer to receive raw data files from a host data processing system and accurately reproduce the contents of the raw data files without the need for special applications or drivers provided by the host system. This avoids or reduces the need to configure the computer system to drive a particular type of printer. The present system directly generates a point mapped image of the source file suitable for output by a printer (whether the present system is incorporated into the printer itself or a host system). Similar considerations apply to other hard copy devices, such as plotters.
When incorporated into a display device (e.g., a projector), the present system enables the device to accurately display the contents of the original data file without the use of applications or drivers on the host system and without the need for specific configuration of the host system and/or the display device. When these types of peripheral devices are equipped with the present system, data files from any source may be received and output over any type of data communication network.
Further, the systems and methods described herein may be incorporated into an in-vehicle system to provide information to the driver, or an entertainment system to facilitate the transfer of information within the vehicle or the distribution of information to a network for communication outside of the vehicle. Further, it should be understood that the system described herein may drive a device having multiple output sources so that a consistent display is maintained with only modifications to the control parameters. Examples include, but are not limited to, an STB or in-car system that incorporates a visual display and print head to allow viewing and printing of files without the need for source applications and drivers.
From the foregoing, it should be appreciated that the system of the present invention may be "hardwired"; for example, in ROM and/or integrated into an application specific integrated circuit ASIC or other single chip system, or may be implemented as firmware (programmable read only memory, such as erasable programmable read only memory epram), or as software and stored locally or remotely for retrieval and execution as required by a particular device.
It is understood, therefore, that this invention is not limited to the embodiments disclosed herein, but is to be accorded the widest scope consistent with the claims.
Claims (40)
1. A method for representing digital content, the method comprising the steps of:
(a) receiving source data (11) representing the digital content;
(b) -processing said source data (11) to translate said source data into an internal representation (14) of said digital content, said internal representation (14) having a predetermined format; and the source data processing comprises:
(c) identifying objects (34, 36) present within the source data (11);
(d) creating a file object (120) for each object identified within the source data (11), the file object (120) representing an internal representation of the encountered object, and the file object (120) separating the structure of the object from the data content of the object;
(e) grouping the file objects (120) into a file structure (114), the file structure (114) representing a structure of the digital content;
(g) grouping data content (168, 172) of the objects into a data content structure (150); and
(e) providing a set of pointers that associate the file objects (120) in the file structure (114) with the data content (168, 172) stored in the data content structure (150);
the method is characterized in that:
receiving source data (11) representing said digital content in any one of a plurality of predetermined formats; and further characterized by:
prior to performing step (a), providing a plurality of file proxy modules (12), each of said file proxy modules (12) for translating source data (11) in at least one of said plurality of formats into said internal representation format (14);
after step (a) and before step (b), identifying a particular file proxy module (12) from the plurality of file proxy modules (12), the particular file proxy module (12) for translating the current source data (11) into the format (14) of the internal representation; and
performing the processing of step (b) using the specific file proxy module, whereby the format of the internal representation (14) is independent of the format of the source data (11).
2. The method of claim 1, further comprising: an indirection list (160) is created, the indirection list (160) storing the set of pointers that associate the file object (120) with the data content (168, 172).
3. The method according to claim 1 or 2, wherein receiving source data (11) comprises receiving a data stream generated by an application (13).
4. The method according to claim 1 or 2, wherein receiving source data (11) comprises receiving a data stream generated from streaming data from an application (13).
5. A method according to claim 1 or 2, wherein receiving source data (11) comprises receiving a data stream from a peripheral device.
6. The method according to claim 1 or 2, wherein receiving source data (11) comprises receiving data streams from a plurality of data sources.
7. The method of claim 6, further comprising:
the file objects found in the first source data and the second source data are merged to create a composite file structure.
8. The method of claim 1, wherein grouping the file objects (120) into a file structure (114) representing the structure of the digital content comprises filtering the file objects to select a subset of file objects for the file structure.
9. The method of claim 1, wherein grouping the file objects (120) into a file structure (114) representing the structure of the digital content comprises grouping the file objects into a configuration that is different from the structure of the source data.
10. The method of claim 1, wherein grouping the file objects (120) into a file structure (114) representing the structure of the digital content comprises adding file objects to change the structure of the digital content.
11. The method of claim 1, wherein grouping the data content (168, 172) of the objects into a data content structure (150) includes filtering content to select content for the internal representation.
12. The method of claim 1, wherein grouping the data content (168, 172) of the objects into a data content structure (150) includes adding content to select content for the internal representation.
13. The method of claim 1, further comprising: the pointers are processed to rearrange the association between the data content and the file objects so that data content from one source can be replaced by data content from another source.
14. The method of claim 1, further comprising a process for compressing file objects stored in the internal representation (14) by combining file objects having similar attributes.
15. The method of claim 1, further comprising creating a resource table (112) for storing resources identified within a data source.
16. The method of claim 15, wherein the resource comprises a resource selected from the group consisting of a font, a color list, a style, and a link.
17. The method of claim 1, including a data transfer process wherein said data content (168, 172) may be stored or transferred independently of said file structure (114).
18. The method of claim 1, comprising a compression process for compressing the data content.
19. The method of claim 1, comprising an encoding process for encoding the data content.
20. The method of claim 1, comprising a compression process for compressing the file structure.
21. The method of claim 1, comprising an encoding process for encoding the file structure.
22. The method of claim 1, wherein the file object (120) includes location information indicating a location of the content within a file.
23. The method of claim 22, wherein the location information may be relative or fixed location information.
24. The method of claim 1, wherein the file structure (114) defines location information that indicates a location of one object relative to other objects in a file structure.
25. The method of claim 1, wherein the file structure (114) includes file objects having a set of defined parameters including spatial, temporal, and physical.
26. The method of any of claims 22 to 25, wherein the visual location of the content in an internal representation (14) is tracked independently of the structural location of the content in a file.
27. The method of claim 1, wherein the digital content comprises content selected from the group consisting of text, graphics, audio, video, interaction, script, and audio-video.
28. The method of claim 1, further comprising a process for outputting the digital content.
29. The method of claim 1, wherein said process for outputting digital content comprises a process for outputting digital content in a format representative of said internal representation (14).
30. A method according to claim 28 or 29, wherein the process for outputting digital content comprises a process for outputting content in a format compatible with a selected known file format.
31. The method of claim 29, wherein a format representing the internal representation (14) is based on a structure selected from the group consisting of a binary data structure, a text description, a markup text description, and a luminance/chrominance color pattern.
32. The method of claim 29, wherein the format representing the internal representation (14) may be based on a universal text coding model including a code selected from the group consisting of Unicode, shift-mapping, and big-5.
33. The method of claim 1, wherein the file object (120) includes associated style information.
34. The method of claim 33, wherein the style associated with the document text object includes font type, font size, whether the character is bolded, skewed, or otherwise.
35. The method of claim 33, wherein the style information comprises page style information.
36. A system for representing digital content, the system comprising:
-input means for receiving source data (11) representing said digital content;
-means (12) for processing said source data for translating said source data (11) into an internal representation (14) of said digital content, said internal representation (14) having a predetermined format; and the source data processing comprises:
identifying objects (34, 36) that appear within the source data;
creating a file object (120) for each object identified within the source data (11), the file object (120) representing an internal representation of the encountered object, and the file object (120) separating the structure of the object from the data content of the object;
grouping the file objects (120) into a file structure (114), the file structure (114) representing a structure of the digital content;
grouping data content (168, 172) of the objects into a data content structure (150); and
providing a set of pointers that associate the file objects (120) in the file structure (114) with the data content (168, 172) stored in the data content structure (150);
the system is characterized in that:
said input mechanism being adapted to receive source data (11) representing said digital content in any one of a plurality of predetermined formats; and the means for processing the source data comprises a plurality of file proxy modules (12), each of the file proxy modules (12) being configured to translate source data (11) in at least one of the plurality of formats into the internal representation format (14);
the input mechanism identifying a particular document agent module (12) from the plurality of document agent modules (12), the particular document agent module (12) for translating the current source data (11) into the internal representation format (14); and
the specific file broker module performs processing of the source data, whereby the format of the internal representation (14) is independent of the format of the source data (11).
37. The system of claim 36, including a file broker module (12), the file broker module (12) capable of understanding a plurality of file formats.
38. The system according to claim 36 or 37, further comprising a set of object types (16), said object types (16) representing the types of content present in said source data (11).
39. The system of claim 38, wherein the set of object types (16) includes a bitmap object type, a vector graphics object type, a video type, an animation type, a button type, a script, and a text object type.
40. The system of claim 36, wherein the file broker module (12) identifies the file format by processing a characteristic selected from the group consisting of file content, file name, network type, transport mechanism, and disk type.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GBGB0009129.8A GB0009129D0 (en) | 2000-04-14 | 2000-04-14 | Digital document processing |
GB0009129.8 | 2000-04-14 | ||
US09/703,502 US7055095B1 (en) | 2000-04-14 | 2000-10-31 | Systems and methods for digital document processing |
US09/703,502 | 2000-10-31 | ||
PCT/GB2001/001720 WO2001080069A1 (en) | 2000-04-14 | 2001-04-17 | Systems and methods for digital document processing |
Publications (2)
Publication Number | Publication Date |
---|---|
HK1057111A1 true HK1057111A1 (en) | 2004-03-12 |
HK1057111B HK1057111B (en) | 2005-08-05 |
Family
ID=
Also Published As
Publication number | Publication date |
---|---|
AU5049401A (en) | 2001-10-30 |
JP2003531441A (en) | 2003-10-21 |
WO2001080069A1 (en) | 2001-10-25 |
EP1272940A1 (en) | 2003-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1180362C (en) | Systems and methods for digital document processing | |
JP5306429B2 (en) | Method for digital document processing | |
EP1272922B1 (en) | Digital document processing | |
HK1057111A1 (en) | Systems and methods for digital document processing | |
HK1057111B (en) | Systems and methods for digital document processing | |
HK1057936B (en) | Systems and methods for digital document processing | |
HK1056636B (en) | Digital document processing system, data processing system and peripheral device | |
HK1100456B (en) | Digital document processing system, data processing system and peripheral device | |
HK1057278B (en) | Systems and methods for generating visual representations of graphical data and digital document processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PC | Patent ceased (i.e. patent has lapsed due to the failure to pay the renewal fee) |
Effective date: 20180417 |