Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terms to which the present invention relates will be explained first:
ePub: is a free open standard and belongs to a content which can be automatically rearranged; i.e. the text content can be displayed in a manner most suitable for reading, depending on the characteristics of the reading device.
Portable Document Format (PDF): the file format is developed by Adobe Systems for exchanging files in a manner independent of application programs, operating Systems, and hardware.
The specific application scenario of the present invention is as follows. When a user needs to read the ePub file on line, the server sends the file in the ePub format to the terminal; the terminal acquires the file in the ePub format, then the terminal conducts interpretation processing on the file in the ePub format, and then the file in the ePub format is displayed in a webpage mode. However, in the prior art, when the terminal acquires the file in the ePub format, the terminal needs to process the file in the ePub format, and the number of elements in the file in the ePub format is large, so that when the terminal places the file in the ePub format in a webpage for display, the terminal has many processing processes, such as rendering, typesetting and the like, and further, when the terminal displays the file in the ePub format on line, the terminal is not slow, and the terminal is inconvenient for a user to read on line.
The invention provides an electronic file processing method and device, and aims to solve the technical problems in the prior art.
The following describes the technical solutions of the present invention and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of an electronic document processing method according to an embodiment of the present disclosure. As shown in fig. 1, the method includes:
step 101, receiving a first reading request sent by a terminal, wherein the first reading request includes an identifier of a chapter to be read of an ePub file.
In this embodiment, specifically, the server receives a first reading request sent by the terminal, where the first reading request includes a type of a file to be read, an identifier of the file to be read, and an identifier of a reading section of the file to be read; and the server can determine the file to be read corresponding to the file to be read identifier.
Then, when the type of the file to be read is the file in the ePub format, the server may determine the identifier of the reading section of the file to be read, which is the identifier of the reading section of the ePub file.
Step 102, extracting ePub chapter content corresponding to the mark of the chapter to be read of the ePub file from the electronic document buffer, wherein the electronic document buffer comprises the analyzed ePub file, and the analyzed ePub file comprises the mark of the ePub chapter, the ePub chapter content and the corresponding relation between the mark of the ePub chapter and the ePub chapter content.
In this embodiment, specifically, an electronic document buffer is set in the server, the electronic document buffer includes at least one parsed ePub file, and each parsed ePub file includes an identifier of an ePub chapter, content of the ePub chapter, and a correspondence between the identifier of the ePub chapter and the content of the ePub chapter. After determining which ePub file the terminal requests, the server may query an electronic document cache for an parsed ePub file corresponding to the ePub file; and then the server determines ePub chapter content corresponding to the identification of the chapter to be read of the ePub file according to the corresponding relation between the identification of the ePub chapter and the ePub chapter content.
Wherein, the ePub file chapter content includes at least one of the following: text, picture, video, audio.
Step 103, sending the ePub chapter content corresponding to the identification of the chapter to be read of the ePub file to the terminal, so that the terminal displays the ePub chapter content corresponding to the identification of the chapter to be read of the ePub file on the webpage.
In this embodiment, specifically, the server sends the determined ePub chapter content to the terminal. When a file is transmitted between a server and a terminal, a hypertext Transfer Protocol (HTTP) is used for transmission.
Specifically, the server encrypts the determined ePub chapter content by adopting a JavaScript RSA encryption algorithm (RSA algorithm) to generate encrypted ePub chapter content; and then the server sends the encrypted ePub chapter content to the terminal by adopting an HTTP (hyper text transport protocol).
Then the terminal decrypts the encrypted ePub chapter content to obtain the decrypted ePub chapter content; and the terminal displays the decrypted ePub chapter content on the webpage.
Moreover, the terminal can receive an adding request sent by a user, wherein the adding request comprises adding content; and the terminal adds the adding content to the displayed ePub chapter content. And further, the functions of adding annotations, bookmarks and the like with the user are completed.
In addition, in the process of this embodiment, the processes of user registration and heartbeat check are completed between the server and the terminal, so as to ensure the communication quality between the terminal and the server; the first reading request can also comprise user information, and the server can judge whether the terminal where the user is located supports reading, whether the user operation request is safe and robust and the like through the user information in the first reading request.
In this embodiment, a first reading request sent by a terminal is received, where the first reading request includes an identifier of a chapter to be read of an ePub file; extracting ePub chapter content corresponding to the mark of the to-be-read chapter of the ePub file from an electronic document buffer, wherein the electronic document buffer comprises an analyzed ePub file, and the analyzed ePub file comprises the mark of the ePub chapter, the ePub chapter content and the corresponding relation between the mark of the ePub chapter and the ePub chapter content; and sending the ePub chapter content corresponding to the identification of the chapter to be read of the ePub file to the terminal, so that the terminal displays the ePub chapter content corresponding to the identification of the chapter to be read of the ePub file on the webpage. Therefore, when the terminal requests the ePub file, the server only sends the ePub chapter content corresponding to the terminal. The server already analyzes the ePub file, and the ePub chapter content sent to the terminal is the analyzed ePub chapter content; therefore, when the terminal displays the file in the ePub format in the webpage, the terminal does not need to perform rendering, typesetting and other processing on the file in the ePub format, the speed of displaying the ePub file on line by the terminal is increased, and the user can read the file on line conveniently.
Fig. 2 is a schematic flowchart of another electronic document processing method according to an embodiment of the present application. As shown in fig. 2, the method includes:
step 201, according to a preset ePub file rule, analyzing the ePub file to generate an analyzed ePub file.
In this embodiment, specifically, the server may perform parsing processing on the ePub file to generate an parsed ePub file.
Specifically, the server firstly pre-decompresses the ePub file to obtain a decompressed ePub file; then, the server analyzes and classifies the whole decompressed ePub file according to ePub file rules through minetype, content.opf, toc.ncx and the like, and the server can support collection and classification of various picture formats and can generate a linear reading sequence and a chapter directory structure of the file. The obtained analyzed ePub file comprises a chapter directory, chapter contents and a corresponding relation between the chapter directory and the chapter contents, and the chapter directory represents the identification of each chapter of the ePub file.
Step 202, storing the parsed ePub file in an electronic document buffer.
In this embodiment, specifically, the server puts the parsed ePub file into an electronic document buffer for storage. Moreover, the server may perform multiple cache encapsulation on the content of the parsed ePub file, and further implement page-level caching by using opensymphony (oscache), and may cache a single file, cache a Uniform Resource Locator (URL) Pattern (Pattern), and set a cache attribute.
Step 203, receiving a first reading request sent by the terminal, wherein the first reading request includes an identifier of a chapter to be read of the ePub file.
In this embodiment, specifically, this step may refer to step 101 in fig. 1, and is not described again.
Step 204, extracting ePub chapter content corresponding to the mark of the chapter to be read of the ePub file from the electronic document buffer, wherein the electronic document buffer comprises the analyzed ePub file, and the analyzed ePub file comprises the mark of the ePub chapter, the ePub chapter content and the corresponding relation between the mark of the ePub chapter and the ePub chapter content.
In this embodiment, specifically, this step may refer to step 102 in fig. 1, and is not described again.
Step 205, sending the ePub chapter content corresponding to the identification of the chapter to be read of the ePub file to the terminal, so that the terminal displays the ePub chapter content corresponding to the identification of the chapter to be read of the ePub file on the webpage.
In this embodiment, specifically, this step may refer to step 103 in fig. 1, and is not described again.
In the embodiment, the ePub file is analyzed according to a preset ePub file rule, so that an analyzed ePub file is generated; storing the analyzed ePub file into an electronic document buffer; and then the ePub file is processed and analyzed at the server side, and the ePub file does not need to be processed and analyzed by the terminal. A first reading request sent by a receiving terminal is received, wherein the first reading request comprises an identifier of a section to be read of an ePub file; extracting ePub chapter content corresponding to the mark of the to-be-read chapter of the ePub file from an electronic document buffer, wherein the electronic document buffer comprises an analyzed ePub file, and the analyzed ePub file comprises the mark of the ePub chapter, the ePub chapter content and the corresponding relation between the mark of the ePub chapter and the ePub chapter content; and sending the ePub chapter content corresponding to the identification of the chapter to be read of the ePub file to the terminal, so that the terminal displays the ePub chapter content corresponding to the identification of the chapter to be read of the ePub file on the webpage. Therefore, when the terminal requests the ePub file, the server only sends the ePub chapter content corresponding to the terminal. The server already analyzes the ePub file, and the ePub chapter content sent to the terminal is the analyzed ePub chapter content; therefore, when the terminal displays the file in the ePub format in the webpage, the terminal does not need to perform rendering, typesetting and other processing on the file in the ePub format, the speed of displaying the ePub file on line by the terminal is increased, and the user can read the file on line conveniently.
In an optional implementation manner, on the basis of the above embodiment, the method may further include the following steps:
step 301, performing picture segmentation processing on the PDF file to generate a segmented PDF file, where the segmented PDF file includes a picture, an identifier of a PDF chapter, a picture path, and a correspondence between the identifier of the PDF chapter and the picture path.
In this embodiment, specifically, the server first performs picture segmentation processing on a PDF file; the server can segment the PDF file into at least one picture, and the format of the picture can be any one of the following: tag Image File Format (TIFF), Portable Network Graphics (PNG), Graphics Interchange Format (GIF), JPEG, Scalable Vector Graphics (SVG), text document (TXT). And, the generated picture is in-line font supporting PDF.
Specifically, in order to parse and slice PDF files more efficiently and quickly, the server uses JDk thread pools; the server divides the PDF file into small files by using an JDk thread pool, and then divides the small files into a plurality of pictures; the server may then perform a scaling process on each picture. Therefore, the method can meet the requirement of certain precision of the picture, reduce memory overhead and memory occupation, and further reduce the overflow of the memory occupation.
Then, the server configures a picture path for each picture, so as to obtain a segmented PDF file, wherein the segmented PDF file comprises the picture, the identification of the PDF chapter, the picture path, and the corresponding relationship between the identification of the PDF chapter and the picture path. Wherein, the picture path is a URL.
Step 302, storing the picture path and the corresponding relationship between the identifier of the PDF chapter and the picture path in an electronic document buffer.
In this embodiment, specifically, the server stores the obtained picture path and the correspondence between the identifier of the PDF chapter and the picture path in the electronic document buffer.
Moreover, the server can perform multiple cache packaging on the obtained picture path, further realize page level cache by using the oscache, cache a single file and cache the URL Pattern, and set the cache attribute. Furthermore, the time-consuming problem of loading the generated pictures and PDF files on the server is reduced; the server may be capable of caching any Uniform Resource Identifier (URI) through the cache filtering function of servlet 2.3. And moreover, the Jgroups are integrated to realize cached clusters, so that the files in the electronic cache can be acquired more quickly.
Step 303, receiving a second reading request sent by the terminal, where the second reading request includes an identifier of a chapter to be read of the PDF file.
In this embodiment, specifically, the server receives a second reading request sent by the terminal, where the second reading request includes a type of a file to be read, an identifier of the file to be read, and an identifier of a reading section of the file to be read; and the server can determine the file to be read corresponding to the file to be read identifier.
Then, when the type of the file to be read is a file in the PDF format, the server may determine the identifier of the reading section of the file to be read, which is the identifier of the section of the PDF file to be read.
And step 304, determining a picture path corresponding to the identification of the to-be-read section of the PDF file according to the electronic document buffer, wherein the electronic document buffer also comprises the corresponding relation between the identification of the PDF section and the picture path.
In this embodiment, specifically, because the electronic document buffer has a correspondence between the identifier of the PDF chapter and the picture path, the server may determine the picture path corresponding to the identifier of the chapter to be read of the current PDF file.
And 305, determining a picture corresponding to the picture path according to the corresponding relation between the preset picture path and the picture.
In this embodiment, specifically, the server stores a corresponding relationship between the picture path and the picture, and the server can determine the picture corresponding to the picture path.
And step 306, sending the picture corresponding to the picture path to the terminal so that the terminal displays the picture corresponding to the picture path on the webpage.
In this embodiment, specifically, the server sends the determined picture to the terminal. When the pictures are transmitted between the server and the terminal, the pictures are transmitted by adopting an HTTP (hyper text transport protocol).
The terminal may then display the received picture on a web page. Moreover, the terminal can receive an adding request sent by a user, wherein the adding request comprises adding content; the terminal adds the added content to the displayed picture. And further, the functions of adding annotations, bookmarks and the like with the user are completed.
The server performs picture segmentation on the PDF file through the steps, the PDF file is segmented into a plurality of pictures, and the pictures are zoomed; the server sends the picture corresponding to the reading chapter of the PDF file to the terminal; the terminal displays the picture on the webpage; the terminal does not need to process the PDF type file, and the PDF type file can be directly displayed, so that the user can conveniently read the PDF type file on line.
Fig. 3 is a schematic structural diagram of an electronic document processing apparatus according to an embodiment of the present invention, and as shown in fig. 3, the apparatus according to the embodiment may include:
a first receiving module 31, configured to receive a first reading request sent by a terminal, where the first reading request includes an identifier of a to-be-read chapter of an ePub file;
the extracting module 32 is configured to extract, from the electronic document buffer, ePub chapter content corresponding to an identifier of a chapter to be read of the ePub file, where the electronic document buffer includes the parsed ePub file, and the parsed ePub file includes the identifier of the ePub chapter, the ePub chapter content, and a correspondence between the identifier of the ePub chapter and the ePub chapter content;
the first sending module 33 is configured to send the ePub chapter content corresponding to the identifier of the chapter to be read of the ePub file to the terminal, so that the terminal displays the ePub chapter content corresponding to the identifier of the chapter to be read of the ePub file on the web page.
The electronic document processing apparatus of this embodiment can execute the electronic document processing method provided by the embodiment of the present invention, and the implementation principles thereof are similar, and are not described herein again.
In this embodiment, a first reading request sent by a terminal is received, where the first reading request includes an identifier of a chapter to be read of an ePub file; extracting ePub chapter content corresponding to the mark of the to-be-read chapter of the ePub file from an electronic document buffer, wherein the electronic document buffer comprises an analyzed ePub file, and the analyzed ePub file comprises the mark of the ePub chapter, the ePub chapter content and the corresponding relation between the mark of the ePub chapter and the ePub chapter content; and sending the ePub chapter content corresponding to the identification of the chapter to be read of the ePub file to the terminal, so that the terminal displays the ePub chapter content corresponding to the identification of the chapter to be read of the ePub file on the webpage. Therefore, when the terminal requests the ePub file, the server only sends the ePub chapter content corresponding to the terminal. The server already analyzes the ePub file, and the ePub chapter content sent to the terminal is the analyzed ePub chapter content; therefore, when the terminal displays the file in the ePub format in the webpage, the terminal does not need to perform rendering, typesetting and other processing on the file in the ePub format, the speed of displaying the ePub file on line by the terminal is increased, and the user can read the file on line conveniently.
Fig. 4 is a schematic structural diagram of another electronic document processing apparatus according to an embodiment of the present invention, and based on the embodiment shown in fig. 3, as shown in fig. 4, the apparatus according to the embodiment further includes:
the parsing module 41 is configured to, before the first receiving module 31 receives the first reading request sent by the terminal, parse the ePub file according to a preset ePub file rule to generate an parsed ePub file;
and the first storage module 42 is configured to store the parsed ePub file in an electronic document buffer.
The apparatus provided in this embodiment further includes:
a second receiving module 43, configured to receive a second reading request sent by the terminal, where the second reading request includes an identifier of a to-be-read chapter of the PDF file;
the first determining module 44 is configured to determine, according to an electronic document buffer, a picture path corresponding to an identifier of a to-be-read chapter of a PDF file, where the electronic document buffer further includes a correspondence between the identifier of the PDF chapter and the picture path;
a second determining module 45, configured to determine, according to a preset correspondence between the picture paths and the pictures, the pictures corresponding to the picture paths;
and a second sending module 46, configured to send the picture corresponding to the picture path to the terminal, so that the terminal displays the picture corresponding to the picture path on the web page.
The apparatus provided in this embodiment further includes:
the segmentation module 47 is configured to perform picture segmentation on the PDF file before the second receiving module 43 receives the second reading request sent by the terminal, so as to generate a segmented PDF file, where the segmented PDF file includes a picture, an identifier of a PDF chapter, a picture path, and a corresponding relationship between the identifier of the PDF chapter and the picture path;
and the second storage module 48 is configured to store the picture path and the corresponding relationship between the identifier of the PDF chapter and the picture path in the electronic document cache.
The ePub file chapter content includes at least one of the following: text, picture, video, audio.
The electronic document processing apparatus of this embodiment can execute another electronic document processing method provided by the embodiment of the present invention, and the implementation principles thereof are similar, and are not described herein again.
In the embodiment, the ePub file is analyzed according to a preset ePub file rule, so that an analyzed ePub file is generated; storing the analyzed ePub file into an electronic document buffer; and then the ePub file is processed and analyzed at the server side, and the ePub file does not need to be processed and analyzed by the terminal. A first reading request sent by a receiving terminal is received, wherein the first reading request comprises an identifier of a section to be read of an ePub file; extracting ePub chapter content corresponding to the mark of the to-be-read chapter of the ePub file from an electronic document buffer, wherein the electronic document buffer comprises an analyzed ePub file, and the analyzed ePub file comprises the mark of the ePub chapter, the ePub chapter content and the corresponding relation between the mark of the ePub chapter and the ePub chapter content; and sending the ePub chapter content corresponding to the identification of the chapter to be read of the ePub file to the terminal, so that the terminal displays the ePub chapter content corresponding to the identification of the chapter to be read of the ePub file on the webpage. Therefore, when the terminal requests the ePub file, the server only sends the ePub chapter content corresponding to the terminal. The server already analyzes the ePub file, and the ePub chapter content sent to the terminal is the analyzed ePub chapter content; therefore, when the terminal displays the file in the ePub format in the webpage, the terminal does not need to perform rendering, typesetting and other processing on the file in the ePub format, the speed of displaying the ePub file on line by the terminal is increased, and the user can read the file on line conveniently.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.