Disclosure of Invention
In view of the above, an object of the present application is to provide a method, an apparatus, a storage medium, and an electronic device for retrieving file contents.
Based on the above object, the present application provides a method for retrieving file contents, comprising:
creating a content object pointing to text content in a file corresponding to each file type based on a plurality of preset file types;
constructing a file index set corresponding to a plurality of files, and constructing a target list according to preset search conditions based on the files corresponding to the file index set, wherein the search conditions comprise key contents and file types;
For each file corresponding to the target list, reading the text content of the file by using the content object corresponding to the file, and searching whether the key content exists in the text content of the file;
and taking the file with the key content in the text content as a candidate file, forming a result list by the file names of the candidate files, displaying the text content corresponding to the file names and/or calling the file corresponding to the file names by selecting any file name in the result list.
Further, the file types include a TXT type, a Word type, an Excel type, a PPT type and a PDF type; the files comprise TXT files, word files, excel files, PPT files and PDF files;
creating a content object that points to text content in a file for each file type, comprising:
creating a first character object pointing to characters in a TXT file and creating a first image object pointing to images in the TXT file according to the TXT type, wherein the first character object is used for reading the character content in the TXT file and the first image object is used for reading the image content in the TXT file;
creating a second character object pointing to a character in a Word file and creating a second image object pointing to an image in the Word file, wherein the second character object is used for reading character content in the Word file, and the second image object is used for reading image content in the Word file;
Creating a third character object pointing to characters in an Excel file and creating a third image object pointing to images in the Excel file, wherein the third character object is used for reading character contents in the Excel file, and the third image object is used for reading image contents in the Excel file;
creating a fourth character object pointing to characters in the PPT file and creating a fourth image object pointing to images in the PPT file according to the PPT type, wherein the fourth character object is used for reading character contents in the PPT file, and the fourth image object is used for reading image contents in the PPT file;
and creating a fifth character object pointing to characters in the PDF file and a fifth image object pointing to an image in the PDF file according to the PDF type, wherein the fifth character object is used for reading the character content in the PDF file, and the fifth image object is used for reading the image content in the PDF file.
Further, creating a second character object pointing to a character in the Word file, and creating a second image object pointing to an image in the Word file, comprising:
creating a primary Word object pointing to a Word program, and creating a secondary Word object pointing to a Word file under the primary Word object, wherein the primary Word object is used for determining the Word file, and the secondary Word object is used for reading the Word file;
Creating the second character object and the second image object under the second-level Word object;
creating a third character object pointing to a character in the Excel file, and creating a third image object pointing to an image in the Excel file, including,
creating a primary Excel object pointing to an Excel program, and creating a secondary Excel object pointing to an Excel file under the primary Excel object, wherein the primary Excel object is used for determining the Excel file, and the secondary Excel object is used for reading the Excel file;
creating the third character object and the third image object under the secondary Excel object;
creating a fourth character object that points to a character in the PPT file and creating a fourth image object that points to an image in the PPT file, including,
creating a primary PPT object pointing to a PPT program, and creating a secondary PPT object pointing to a PPT file under the primary PPT object, wherein the primary PPT object is used for determining the PPT file, and the secondary PPT object is used for reading the PPT file;
creating the fourth character object and the fourth image object under the secondary PPT object;
creating a fifth character object pointing to a character in the PDF file and creating a fifth image object pointing to an image in the PDF file, including,
Creating a primary PDF object pointing to a PDF program, and creating a secondary PDF object pointing to a PDF file under the primary PDF object, wherein the primary PDF object is used for determining the PDF file, and the secondary PDF object is used for reading each page in the PDF file;
and creating the fifth character object and the fifth image object under the secondary PDF object.
Further, constructing a file index set corresponding to the plurality of files, including:
scanning all files in a database to be retrieved, and constructing the file name of each file and a corresponding file storage path as an index entry corresponding to the file;
and forming all index entries into a file index set corresponding to all files in the database.
Further, based on the file corresponding to the file index set, constructing a target list according to a preset search condition, including:
determining a file with the same file type as the file type in the retrieval condition from the files corresponding to the file index set;
and forming index entries corresponding to the files with the same file types into the target list.
Further, before retrieving whether the key content exists in the text content of the file, the method further includes:
Judging whether each file is encrypted or not;
judging whether a decryption password is available or not in response to determining that any file is encrypted;
decrypting with the decryption password in response to determining that the decryption password is provided;
in response to determining that the decryption password is not available, the file is not retrieved.
Further, the search condition also includes a file name;
after constructing the file index set corresponding to the plurality of files, the method further comprises:
and in response to the file name included in the search condition, searching the file index set for the file with the file name consistent with the file name in the search condition, and taking the file as the candidate file.
Based on the same inventive concept, the application also provides a retrieval device of file content, comprising: the system comprises an object creation module, a target list construction module, a retrieval module and a display module;
the object creation module is configured to create a content object pointing to text content in a file corresponding to each file type based on a plurality of preset file types;
the target list construction module is configured to construct a file index set corresponding to a plurality of files, and construct a target list according to preset retrieval conditions based on the files corresponding to the file index set, wherein the retrieval conditions comprise key contents and file types;
The retrieval module is configured to, for each file corresponding to the target list, read text content of the file by using a content object corresponding to the file, and retrieve whether the key content exists in the text content of the file;
the display module is configured to take the file with the key content in the text content as a candidate file, form a result list with the file names of the candidate files, display the text content corresponding to the file names and/or call the file corresponding to the file names by selecting any file name in the result list.
Based on the same inventive concept, the application also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the file content searching method according to any one of the above when executing the program.
Based on the same inventive concept, the present application also provides a non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium stores computer instructions for causing the computer to perform a method for retrieving file contents as described above.
As can be seen from the above description, the method, the device, the storage medium and the electronic equipment for searching file contents provided by the application are based on different file types of the file to be searched, and respective content objects are created corresponding to each file type, so that text contents of different file types are read, searching across file types is realized, searching from a file index set is realized through a constructed file index set, each file is not required to be searched one by one, efficiency is greatly improved, meanwhile, preliminary screening on the file types is carried out on the file to be searched through searching conditions, so that the efficiency of searching key contents is improved, and finally when candidate files are displayed, text contents can be read from the candidate files through the created content objects, and display of the text contents is realized.
Detailed Description
The present application will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present application more apparent.
It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present application should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present application belongs. The terms "first," "second," and the like, as used in embodiments of the present application, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
As described in the background section, the related file content retrieval method is also difficult to meet the needs of the actual retrieval work.
The applicant finds that in the process of implementing the present application, the main problems of the related file content searching method are: the related search method only supports searching and searching through file names, but cannot directly search file contents, which causes inconvenience and inefficiency of file searching; meanwhile, the related searching method cannot support searching and displaying of various files, and cannot perform unified searching when the files to be searched of various file types are faced.
Based on this, one or more embodiments of the present application provide a method of retrieving file contents.
Embodiments of the present application are described in detail below with reference to the accompanying drawings.
Referring to fig. 1, a method for retrieving file contents according to an embodiment of the present application includes the steps of:
step S101, creating a content object pointing to text content in a file corresponding to each file type based on a plurality of preset file types.
In the embodiment of the application, for a plurality of files to be detected, the files can be distinguished according to a plurality of preset file types, and for files of different file types, different content objects can be created for the files, wherein the content objects point to text contents of the corresponding file types.
Specifically, a plurality of file types may be preset, including: TXT type, word type, excel type, PPT type, PDF type, and the like.
Based on this, all files in the database to be detected can be classified into a TXT file, a Word file, an Excel file, a PPT file, and a PDF file according to the file types described above.
Further, for a TXT file, a first content object may be created that points to the text content in its file.
Specifically, in a TXT file, its text content may include characters and images, based on which a first character object and a first image object may be created for the TXT file.
Wherein the first character object points to a character in the text content of the TXT file and the first image object points to an image in the text content of the TXT file.
Specifically, a primary TXT object may be first created that points to a TXT program, with which the TXT file may be determined and read out of a plurality of files of different file types.
Further, the primary TXT object may be utilized as a parent object to create a next child object, i.e., a first character object and a first image object under the primary TXT object, the first character object may be used to read or extract character content in the TXT file, and the first image object may be used to read or extract image content in the TXT file.
Further, in the Word file, the text content thereof may also include characters and images, on the basis of which a second character object and a second image object with respect to the Word file may be set.
Wherein the second character object points to a character in the text content of the Word file and the second image object points to an image in the text content of the Word file.
Specifically, a primary Word object pointing to a Word program may be first created, with which a Word file may be determined from among a plurality of files of different file types.
Further, the primary Word object may be utilized as a parent object to create a next child object, i.e., a secondary Word object is created under the primary Word object, which may be used to read or extract the determined Word file.
Based on this, the secondary Word object may be utilized as a parent object to create child objects of the secondary Word object, i.e., a second character object that may be used to read or extract character content in the Word file and a second image object that may be used to read or extract image content in the Word file.
Further, in an Excel file, the text content thereof may also include characters and images, on the basis of which a third character object and a third image object with respect to the Excel file may be set.
Wherein the third character object points to a character in the text content of the Excel file and the third image object points to an image in the text content of the Excel file.
Specifically, a primary Excel object pointing to an Excel program may be first created, with which an Excel file may be determined from among a plurality of files of different file types.
Further, the primary Excel object may be utilized as a parent object to create a next child object, i.e., a secondary Excel object under the primary Excel object, which may be used to read or extract the determined Excel file, i.e., the Excel workbook.
Based on this, the secondary Excel object may be used as a parent object to create a secondary object of the secondary Excel object, that is, a third character object and a third image object, which may be used to traverse each cell in the Excel workbook and read or extract the valid cell therein, and further, the third character object may read or extract the character content in the valid cell, and the third image object may read or extract the image content in the valid cell.
Further, in the PPT file, the text content thereof may also include characters and images, on the basis of which a fourth character object and a fourth image object with respect to the PPT file may be set.
Wherein the fourth character object points to a character in the text content of the PPT file and the fourth image object points to an image in the text content of the PPT file.
Specifically, a primary PPT object may be first created that points to a PPT program, with which PPT objects may be used to determine PPT files among a plurality of files of different file types.
Further, the primary PPT object may be utilized as a parent object to create a next child object, i.e., a secondary PPT object under the primary PPT object, which may be used to read or extract the determined PPT file.
Based on this, the secondary PPT object may be utilized as a parent object to create child objects of the secondary PPT object, i.e., a fourth character object and a fourth image object, which may be used to traverse the respective slides in the PPT file, and further, the fourth character object may read or extract character content in the text box, and the fourth image object may read or extract image content in the PPT file.
Further, in the PDF file, the text content thereof may also include characters and images, on the basis of which a fifth character object and a fifth image object with respect to the PDF file may be set.
Wherein the fifth character object points to a character in the text content of the PDF file and the fifth image object points to an image in the text content of the PDF file.
Specifically, a primary PDF object pointing to a PDF program may be first created with which PDF files may be determined among a plurality of files of different file types.
Further, the primary PDF object may be utilized as a parent object to create a next child object, i.e., a secondary PDF object under the primary PDF object, which may be used to read or extract the determined PDF file.
Based on this, a child level object of the secondary PDF object, that is, a fifth character object and a fifth image object, which can be used to read or extract each page in the PDF file, can be created using the secondary PDF object as a parent level object, and further, the fifth character object can read or extract character content in each page, and the fifth image object can read or extract image content in each page.
Step S102, constructing a file index set corresponding to a plurality of files, and constructing a target list according to preset search conditions based on the files corresponding to the file index set, wherein the search conditions comprise key contents and file types.
In the embodiment of the application, for all files in the database to be searched, a file index set can be established for the files, and a subset is established based on the file index set so as to be capable of searching in a smaller range when the files are searched.
Specifically, all files in the database to be checked can be scanned to obtain the file name of each file and the storage path corresponding to the file, and the file name and the corresponding storage path are constructed as index entries corresponding to the file.
It can be determined that an index entry containing a corresponding storage path of a file name is formed for each file in the database to be checked, and accordingly, all the index entries can be formed into a file index set.
Wherein the file index set corresponds to all files in the database to be retrieved.
In this embodiment, as shown in fig. 2, when performing file search, step S201 may be performed first, and search conditions may be set.
In particular, the search criteria may include file type and key content; the file type can be set one or more, and the key content can be key characters in the form of characters or key pictures in the form of pictures.
Further, when searching is performed based on the obtained file index set, according to the file type set in the search condition, an index entry consistent with the file type in the search condition can be selected from the file index set, and the index entry can be formed into a target list.
It can be seen that, when the file types set in the search condition are Word type, excel type and PPT type, the files determined correspondingly in the target list are Word file, excel file and PPT file, that is, step S202 in fig. 2, determines the Word file, excel file and PPT file.
Step S103, for each file corresponding to the target list, reads the text content of the file by using the content object corresponding to the file, and searches whether the key content exists in the text content of the file.
In the embodiment of the present application, based on the above-determined target list, the search may be performed in all files corresponding to the target list.
Specifically, based on the created content objects corresponding to the file types, the content objects corresponding to the different file types can be utilized to read out text content from the files, and key content in the search condition is searched in the text content.
In the specific example shown in fig. 2, the second character object may be used to read the character content of each corresponding Word file in the target list, and the second image object may be used to read the image content of each corresponding Word file in the target list; reading character contents of the PPT files corresponding to the target list by using the third character object, and reading image contents of the PPT files corresponding to the target list by using the third image object; and reading the character content of each PPT file corresponding to the target list by using the fourth character object, and reading the image content of each PPT file corresponding to the target list by using the fourth image object.
Step S104, taking the file with the key content in the text content as a candidate file, forming a result list by the file names of the candidate files, displaying, and displaying the text content corresponding to the file name and/or calling the file corresponding to the file name by selecting any file name in the result list.
In an embodiment of the present application,
based on the above, for any file corresponding to the target list, when the text content of the file contains character content and/or image content consistent with the key content, the file can be used as a candidate file, and the candidate file is displayed, so that the user can select the target file from the candidate files.
In the specific example shown in fig. 2, it may also be determined whether the files in each target list have been encrypted before retrieving the files in the target list.
Further, if the file is not encrypted, step S203 may be performed to retrieve text content.
Further, if the file is encrypted, step S204 may be executed to determine whether to preset a decryption password, where the decryption password may be preset in the execution body of the method.
Further, if the decryption password is not preset for the encrypted file, it is considered that the text content cannot be retrieved, and step S205 may be further executed to discard the retrieval.
Further, if the decryption password is preset for the encrypted file, step S203 may be continued to be executed to retrieve the text content of the file.
Further, for the retrieved candidate files, the respective file names may be formed into a result list, and the result list is presented to the user, that is, step S206 is performed, where the file names are displayed in the result list.
In other embodiments, a file name may be included in the search criteria, and the file name may be used to perform a search in the file index set.
In the specific example shown in fig. 2, if a file name is set in the search condition, the search of the file type may be skipped, and the process of constructing the target list may be skipped, and the set file name may be directly searched in the file index set.
Further, if a consistent file name is retrieved in the file index set, the file corresponding to the file name may be listed as a candidate file and displayed in the result list, that is, step S206 may be performed.
Further, in addition to the file names of the candidate files being shown to the user in the result list, a content list may be created, and after the user selects an arbitrary file name from the result list, the text content of the corresponding file is shown in the content list, that is, step S207 is performed to display the text content.
Specifically, the user may complete the selection of the file name by, for example, clicking the left mouse button.
When the text content is displayed, a part, which is consistent with the key content, of the text content can be highlighted.
Further, in addition to displaying the file names of the candidate files and displaying the text content to the user in the result list, the user may select an arbitrary file name from the result list, and approximately the file folder in which the file name corresponds to the file, that is, execute step S08, open the file folder.
Specifically, the user may complete the selection of the file name by, for example, clicking a right mouse button.
Further, the user may also read the corresponding file after selecting an arbitrary file name from the result list, and open the file, that is, execute step S209 to open the file.
Specifically, the user may complete the selection of the file name by, for example, double clicking the left mouse button.
Therefore, the method for searching the file content according to the embodiment of the application creates respective content objects corresponding to each file type based on different file types of the file to be searched, so as to read text content of different file types, realize searching across file types, realize searching from the file index set through the constructed file index set, and not need to search each file one by one, greatly improve efficiency, and simultaneously, perform preliminary screening on the file types of the file to be searched through search conditions, so as to improve efficiency in searching key content, and finally, when candidate files are displayed, text content can be read from the candidate files through the created content objects, thereby realizing display of the text content.
It should be noted that, the method of the embodiment of the present application may be performed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the method of an embodiment of the present application, the devices interacting with each other to complete the method.
It should be noted that the foregoing describes some embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Based on the same inventive concept, the embodiment of the application also provides a retrieval device of file content, which corresponds to the method of any embodiment.
Referring to fig. 3, the file content retrieving apparatus includes: an object creation module 301, a target list construction module 302, a retrieval module 303, and a presentation module 304;
wherein, the object creating module 301 is configured to create, based on a plurality of preset file types, a content object pointing to text content in a file corresponding to each file type;
the target list construction module 302 is configured to construct a file index set corresponding to a plurality of files, and construct a target list according to preset search conditions based on the files corresponding to the file index set, where the search conditions include key content and file types;
the retrieving module 303 is configured to, for each file corresponding to the target list, read text content of the file by using a content object corresponding to the file, and retrieve whether the key content exists in the text content of the file;
the displaying module 304 is configured to use the file with the key content in the text content as a candidate file, and form and display the file names of the candidate files into a result list, and select any file name in the result list to display the text content corresponding to the file name and/or call the file corresponding to the file name.
For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of each module may be implemented in the same piece or pieces of software and/or hardware when implementing an embodiment of the present application.
The device of the foregoing embodiment is used to implement the corresponding method for searching the file content in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Based on the same inventive concept, the embodiment of the application also provides an electronic device corresponding to the method of any embodiment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the method for searching file contents according to any embodiment.
Fig. 4 shows a more specific hardware architecture of an electronic device according to this embodiment, where the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided by the embodiments of the present application.
The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present application are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.
The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown in the figure) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary for implementing the embodiments of the present application, and not all the components shown in the drawings.
The device of the foregoing embodiment is used to implement the corresponding method for searching the file content in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Based on the same inventive concept, the present application also provides a non-transitory computer readable storage medium corresponding to the method of any of the above embodiments, where the non-transitory computer readable storage medium stores computer instructions for causing the computer to execute the method of retrieving file content according to any of the above embodiments.
The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
The storage medium of the foregoing embodiments stores computer instructions for causing the computer to execute the method for searching file contents according to any one of the foregoing embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the application (including the claims) is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the application, the steps may be implemented in any order and there are many other variations of the different aspects of the embodiments of the application as described above, which are not provided in detail for the sake of brevity.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure embodiments of the present application. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring embodiments of the present application, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the present application are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
While the application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.
The embodiments of the application are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements and the like, which are within the spirit and principles of the embodiments of the application, are intended to be included within the scope of the application.