[go: up one dir, main page]

GB2458490A - Displaying the summary of a text file - Google Patents

Displaying the summary of a text file Download PDF

Info

Publication number
GB2458490A
GB2458490A GB0805156A GB0805156A GB2458490A GB 2458490 A GB2458490 A GB 2458490A GB 0805156 A GB0805156 A GB 0805156A GB 0805156 A GB0805156 A GB 0805156A GB 2458490 A GB2458490 A GB 2458490A
Authority
GB
United Kingdom
Prior art keywords
text file
term
detected
terms
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0805156A
Other versions
GB0805156D0 (en
Inventor
Ian Matthew Haynes
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Triad Group PLC
Original Assignee
Triad Group PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Triad Group PLC filed Critical Triad Group PLC
Priority to GB0805156A priority Critical patent/GB2458490A/en
Publication of GB0805156D0 publication Critical patent/GB0805156D0/en
Publication of GB2458490A publication Critical patent/GB2458490A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method of processing a text file comprises receiving a text file, detecting a plurality of terms within the text file, calculating the importance of each detected term within the text file and displayed a generated summary of the text file. The displayed summary comprises one or more of the detected terms, with the text size of each detected term being in proportion to the calculated importance of the term. The importance can be calculated by identifying the frequency of each detected term within the file. Preferably, the summary of the text file comprises only the n most important detected terms, where n is an integer such as n = 3.

Description

DESCRIPTION
PROCESSING A TEXT FILE
This invention relates to a method of, and system for processing a text file.
It is known to provide a summary of a text file. For example, if an individual searches electronically through a database of files, via a suitable io searching interface, then the documents that are returned as the result of the search are commonly abbreviated to their title and/or possibly their abstract.
The title and abstract are usually created by the author of the document either when the document is first created or when the document is added to the database. In some circumstances, an abstract is composed by a specialist author, separate from the original author. All of the known techniques for producing summaries of documents rely on human input, which has a cost implication and also relies on the expertise of the author of the summary.
It is therefore an object of the invention to improve upon the known art.
According to a first aspect of the present invention, there is provided a method of processing a text file comprising receiving a text file, detecting a plurality of terms within the text file, calculating the importance of each detected term within the text file, generating a summary of the text file, the summary comprising one or more of the detected terms, and displaying the summary of the text file, the text size of each detected term being in proportion to the calculated importance of the term.
According to a second aspect of the present invention, there is provided a system for processing a text file comprising a processor arranged to receive a text file, to detect a plurality of terms within the text file, to calculate the importance of each detected term within the text file, and to generate a summary of the text file, the summary comprising one or more of the detected terms, and a display device arranged to display the summary of the text file, the text size of each detected term being in proportion to the calculated importance of the term.
According to a third aspect of the present invention, there is provided a computer program product on a computer readable medium for processing a text file, the product comprising instructions for receiving a text file, detecting a plurality of terms within the text file, calculating the importance of each detected term within the text file, generating a summary of the text file, the summary comprising one or more of the detected terms, and displaying the summary of the text file, the text size of each detected term being in proportion io to the calculated importance of the term.
Owing to the invention, it is possible to provide an automated summary of a text file that will provide a result that is clear and concise, and also one that provides further information visibly about the importance of the detected terms within the original text file.
Advantageously, the step of calculating the importance of each detected term within the text file comprises calculating the frequency of each detected term within the text file. This method of determining the importance of a term within the text file is very robust and easily executed, as all that is needed is to count the occurrences of the specified terms within the text file. These terms are then displayed in the summary with a size that is dependent on the frequency with which they are found within the original text file. Other methods of determining the importance of the terms within the document are possible.
For example, the context with which terms are used or other detected components (such as numbers) that are linked to the terms can be used to determine the relative (or absolute) importance of terms within the text file.
For example, if a database contains curriculum vitaes (CVs) of individuals, then when a user searches for particular skills, they are presented with a list of suitable candidates. Each candidate is shown with the full list of the skills contained in their CV. In order to emphasise the relative strength of each skill, as represented by the number of occurrences in the CV or number of years experience, the size of the font (or Point size) changes. The more times a skill is mentioned in the CV, the larger the size of the font.
Preferably, the method further comprises accessing a list of terms for detecting within the text file. By providing a list of terms that are relevant to the context of the text file which are used to search the file, there is no need to use any complicated processing of the text file to determine the terms that are to be used in the summary. A list is provided of terms, and the text file is then parsed to find the importance of these terms within the document (for example using the frequency to determine the importance).
Ideally, the summary of the text file comprises only the n most important detected terms, where n is an integer. For example n = 3. In many situations it will be appropriate to limit the number of terms that are contained within the summary to avoid the summary becoming too large and unwieldy. By having a predetermined limit on the number of terms within the summary, then only the most relevant information is provided to the user.
Advantageously, the step of displaying the summary of the text file is comprises displaying the text size of each detected term in direct proportion to the calculated importance of the term. This is one clever way in which the summary can be provided to the user. The terms that are presented in the summary are sized in direct proportion to their calculated importance. For example, it a first term is mentioned eight times in the text file, and a second term is mentioned four times in the text file, then in the summary, the first term will have a text size that is twice that of the text size of the second term.
Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which:-Figure 1 is a schematic diagram of a system for processing a text file, Figure 2 is a schematic diagram of the processing of the text file, Figure 3 is a flowchart of the method of processing the text file, and Figure 4 is a schematic diagram of a graphical display on a display device.
Figure 1 illustrates a system which can be used to process text files.
The system of this Figure is a conventional personal computer (PC) used in a desktop environment, but could equally be a networked workstation for example. The system comprises a display device 10, which can be any suitable display for viewing documents, such as a CRT display or flat panel display capable of displaying text. The system also comprises a processing component 12, which in turn comprises a large number of processing, storage and input/output elements. Two such elements are illustrated, being a processor 14 and a database 16. The processor 14 is arranged to carry out process tasks and to control the image shown by the display device 10. The database 16 is a local storage device that stores information for use by the io processor 14. In the Figure, the database 16 is shown connected to the processor 14 by a local bus. The system also includes conventional user interface devices 18, being a keyboard 18a and a mouse 18b.
The processor 14 has access to one or more text files. These could be stored locally by the database 16, or they could be accessed via a network connection to a remote database. For example, a user of the system of Figure 1 may wish to carry out a search of CVs in a remote database. These CVs are represented as individual text files within the remote database. As text files are recalled from the remote database, they are processed by the processor 14 which generates a summary of each text file that is returned, for display to the user.
Figure 2 illustrates schematically the processing of a text file 20, which is handled by the processor 14. The file 20, as discussed above, can be recalled locally from an existing database, or can be received from a remote database. The processor 14 is arranged to receive the text file 20 and to detect a plurality of terms within the text file 20. In one embodiment, the processor 14 derives the terms that it is detecting directly from the document itself, for example simply looking for the most common terms within the document 20. In this embodiment, the local database 16 stores a list 24 which specifies those terms 26 that are to be detected within the file 20. For example, if the file 20 is a CV, then the list 24 of terms 26 will specify precise terms 26 to be detected within the text file 20 such as skill labels and so on.
--
Once the processor 14 has detected the plurality of terms within the file 20, then the processor 14 is arranged to calculate the importance of the detected terms. The importance can be defined in many different ways. One simple way is for the processor 14 to calculate the frequency of each detected term 26 within the file 20. Other methods, for example in the CV embodiment, might relate to the number of years that are linked to a specific job or skill.
once the processor 14 has calculated the importance of the terms 26, then the processor 14 is arranged to generate a summary 22 of the text file 20, the summary 22 comprising one or more of the detected terms 26.
io The method of processing the text file 20 is summarised in Figure 3 and comprises, firstly, at step Si, receiving the text file, The second step S2 is the step of detecting the plurality of terms within the text file 20, and this is followed by the step S3 of calculating the importance of each detected term within the text file 20. As discussed above, a simple mode of working out the is importance of the detected terms is to work out the frequency with which those terms appear in the file 20. The next step is the step S4 of generating the summary 22 of the text file 20, the summary 22 comprising one or more of the detected terms, and finally, the system, at step S5 is arranged to display the summary 22 of the text file 20, the text size of each detected term being in proportion to the calculated importance of the term.
The method of Figure 3 may also include accessing the list 24 of terms for detecting within the text file 20. This extra step will take place between steps Si and S2 of Figure 3. The processor 14 is arranged to execute steps Si to S4 in turn for the file 20. The processor 14 may concurrently or consecutively process more than one text file 20, depending upon the application that is being used to generate the summary 22. For example, if the user is executing a query (such as an SQL query) against a database, then each of the hits that are returned by the query will be processed according to the flowchart of Figure 3.
An example of the type of result that would be generated using the system of Figure 1, according to the methodology of Figures 2 and 3, is shown in Figure 4, which shows an application window 28 on the display device 10.
The user has made a search, for example, for candidates within a specific geographic locality. The search has returned three results, candidates 1 to 3.
Each candidate has a text file (their CV) associated with them, but the processor 14 has generated the respective summaries 22a, 22b and 22c for s each of the text files returned. These summaries 22 can be generated as and when they are needed, or they can be pre-generated and simply recalled from a suitable storage medium. The layout shown on the screen 10 is only one example, a wide variety of different arrangements are possible. It is sufficient that at least one summary 22 is displayed to the user via the display device 10.
The advantage of the system is delivered by the fact that in displaying the summary 22 of the text files, the text size of each detected term is in proportion to the calculated importance of the term. In the summaries 22 shown in Figure 4, the terms "ORACLE", "SQL", "JAVA" and "PHP" have been detected in the files that make up the CVs of the different candidates. These terms could have been recalled from the list 24, as described above, or could have been detected directly in the text files. Each term is displayed at a size that reflects the calculated importance of the respective term. For example, it is clear that candidate 1 has the most importance with respect to the term "ORACLE". Similar judgements can readily be made about other terms. The processing carried out by the processor 14 supports the display of the summaries that are shown in the Figure. The size of the term, as displayed, depends upon the importance (such as the number of times) that the term is assigned by the processor 14. 7 -

Claims (15)

  1. CLAIMS1. A method of processing a text file comprising * receiving a text file, * detecting a plurality of terms within the text file, * calculating the importance of each detected term within the text file, * generating a summary of the text file, the summary comprising one or more of the detected terms, and * displaying the summary of the text file, the text size of each detected term being in proportion to the calculated importance of the term.
  2. 2. A method according to claim 1, wherein the step of calculating the importance of each detected term within the text file comprises calculating the frequency of each detected term within the text file.
  3. 3. A method according to claim 1 or 2, and further comprising accessing a list of terms for detecting within the text file.
  4. 4. A method according to claim 1, 2 or 3, wherein the summary of the text file comprises only the n most important detected terms, where n is an integer.
  5. 5. A method according to any preceding claim, wherein the step of displaying the summary of the text file comprises displaying the text size of each detected term in direct proportion to the calculated importance of the term.
  6. 6. A system for processing a text file comprising * a processor arranged to receive a text file, to detect a plurality of terms within the text file, to calculate the importance of each detected term within the text file, and to generate a summary of the text file, the summary comprising one or more of the detected terms, and a display device arranged to display the summary of the text file, the text size of each detected term being in proportion to the calculated importance of the term.
  7. 7. A system according to claim 6, wherein the processor is further arranged, when calculating the importance of each detected term within the io text file, to calculate the frequency of each detected term within the text file.
  8. 8. A system according to claim 6 or 7, and further comprising a database arranged to store a list of terms for detecting within the text file.
  9. 9. A system according to claim 6, 7 or 8, wherein the summary of the text file comprises only the n most important detected terms, where n is an integer.
  10. 10. A system according to any one of claims 6 to 9, wherein the display device is further arranged, when displaying the summary of the text file, to display the text size of each detected term in direct proportion to the calculated importance of the term.
  11. 11. A computer program product on a computer readable medium for processing a text file, the product comprising instructions for * receiving a text file, * detecting a plurality of terms within the text file, * calculating the importance of each detected term within the text file, * generating a summary of the text file, the summary comprising one or more of the detected terms, and displaying the summary of the text file, the text size of each detected term being in proportion to the calculated importance of the term.
  12. 12. A method according to claim 11, wherein the instructions for calculating the importance of each detected term within the text file comprise instructions for calculating the frequency of each detected term within the text file.
  13. 13. A method according to claim 11 or 12, and further comprising instructions for accessing a list of terms for detecting within the text file.
  14. 14. A method according to claim 11, 12 or 13, wherein the summary of the text file comprises only the n most important detected terms, where n is an integer.
  15. 15. A method according to any one of claims 11 to 14, wherein the instructions for displaying the summary of the text file comprise instructions for displaying the text size of each detected term in direct proportion to the calculated importance of the term.
GB0805156A 2008-03-20 2008-03-20 Displaying the summary of a text file Withdrawn GB2458490A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0805156A GB2458490A (en) 2008-03-20 2008-03-20 Displaying the summary of a text file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0805156A GB2458490A (en) 2008-03-20 2008-03-20 Displaying the summary of a text file

Publications (2)

Publication Number Publication Date
GB0805156D0 GB0805156D0 (en) 2008-04-23
GB2458490A true GB2458490A (en) 2009-09-23

Family

ID=39356781

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0805156A Withdrawn GB2458490A (en) 2008-03-20 2008-03-20 Displaying the summary of a text file

Country Status (1)

Country Link
GB (1) GB2458490A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5982369A (en) * 1997-04-21 1999-11-09 Sony Corporation Method for displaying on a screen of a computer system images representing search results
GB2365300A (en) * 2000-06-07 2002-02-13 David Meakes Displaying search results according to relevance to query
US20020124026A1 (en) * 2001-03-05 2002-09-05 Weber David J. Methods and apparata for enhancing text to increase reading speed and comprehension
WO2003009583A2 (en) * 2001-07-19 2003-01-30 Koninklijke Philips Electronics N.V. Method and apparatus for providing a user interface
US20030126601A1 (en) * 2001-12-31 2003-07-03 Koninklijke Philips Electronics N.V. Visualization of entertainment content
US7003505B1 (en) * 1999-01-29 2006-02-21 Canon Kabushiki Kaisha Information retrieving apparatus and method therefor, and memory medium storing program therefor
US20080027933A1 (en) * 1999-10-20 2008-01-31 Araha, Inc. System and method for location, understanding and assimilation of digital documents through abstract indicia

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5982369A (en) * 1997-04-21 1999-11-09 Sony Corporation Method for displaying on a screen of a computer system images representing search results
US7003505B1 (en) * 1999-01-29 2006-02-21 Canon Kabushiki Kaisha Information retrieving apparatus and method therefor, and memory medium storing program therefor
US20080027933A1 (en) * 1999-10-20 2008-01-31 Araha, Inc. System and method for location, understanding and assimilation of digital documents through abstract indicia
GB2365300A (en) * 2000-06-07 2002-02-13 David Meakes Displaying search results according to relevance to query
US20020124026A1 (en) * 2001-03-05 2002-09-05 Weber David J. Methods and apparata for enhancing text to increase reading speed and comprehension
WO2003009583A2 (en) * 2001-07-19 2003-01-30 Koninklijke Philips Electronics N.V. Method and apparatus for providing a user interface
US20030126601A1 (en) * 2001-12-31 2003-07-03 Koninklijke Philips Electronics N.V. Visualization of entertainment content

Also Published As

Publication number Publication date
GB0805156D0 (en) 2008-04-23

Similar Documents

Publication Publication Date Title
US11995091B2 (en) Website scoring system
US7996378B2 (en) System and method for graphically distinguishing levels of a multidimensional database
US9418336B2 (en) Automatic recognition and insights of data
US9569541B2 (en) Evaluating preferences of content on a webpage
US9311400B2 (en) Method and system for providing time-dependent search results for repetitively performed searches
US8990242B2 (en) Enhanced query suggestions in autosuggest with corresponding relevant data
US7262772B2 (en) Visual content summary
US10175954B2 (en) Method of processing big data, including arranging icons in a workflow GUI by a user, checking process availability and syntax, converting the workflow into execution code, monitoring the workflow, and displaying associated information
US20170212944A1 (en) Automated computer visualization and interaction with big data
US10860675B2 (en) Informational tabs
US11769006B2 (en) Parsing and reflowing infographics using structured lists and groups
US11531723B2 (en) Dynamic contextual library
JP7278324B2 (en) Test method, device, storage medium, and program for map search of electronic map
RU2010114245A (en) GENERAL MODEL EDITING SYSTEM
US20130080419A1 (en) Automatic information presentation of data and actions in search results
CN104598538A (en) Information searching method and device
US20080071738A1 (en) Method and apparatus of visual representations of search results
US20130239026A1 (en) Multi-dimensional content delivery mechanism
WO2017001944A1 (en) Method, system and computer readable memory for generating ranked search results incorporating suggests
US10296624B2 (en) Document curation
EP3942499A1 (en) Analyzing resumes and highlighting non-traditional resumes
GB2458490A (en) Displaying the summary of a text file
US20220350777A1 (en) Document search system, document search method, and computer-readable storage medium
US20170097991A1 (en) Automatically branding topics using color
CN107704174A (en) A kind of window grab method and system, computer installation and memory

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)