[go: up one dir, main page]

US20200084238A1 - Generating highly realistic decoy email and documents - Google Patents

Generating highly realistic decoy email and documents Download PDF

Info

Publication number
US20200084238A1
US20200084238A1 US16/680,873 US201916680873A US2020084238A1 US 20200084238 A1 US20200084238 A1 US 20200084238A1 US 201916680873 A US201916680873 A US 201916680873A US 2020084238 A1 US2020084238 A1 US 2020084238A1
Authority
US
United States
Prior art keywords
fake
files
user
website
alert
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US16/680,873
Inventor
Salvatore J. Stolfo
Carl Sable
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Allure Security Technology Inc
Original Assignee
Allure Security Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Allure Security Technology Inc filed Critical Allure Security Technology Inc
Priority to US16/680,873 priority Critical patent/US20200084238A1/en
Publication of US20200084238A1 publication Critical patent/US20200084238A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1491Countermeasures against malicious traffic using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment
    • G06F17/2264
    • G06F17/278
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N7/005
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • the systems and methods of the present invention may be used to detect and thwart hackers or other unauthorized users of computer systems.
  • the system may process e-mails in son format plus attachments and standalone files within directory structures matching certain specifications.
  • E-mails and files may be processed one user at a time, in sorted order according to timestamps.
  • dates and times may be detected and shifted according to user-specified deltas, and people's names are detected and shifted according to user-provided templates Formatting may be preserved exactly for .docx files and approximately for .pdf files.
  • Text files and html-formatted e-mails may also be handled similarly.
  • the accuracy achieved for detecting recognized concepts may be high, based on a Bayesian machine learning algorithm for named entity recognition followed by a second phase to exclude false positives.
  • fake e-mails including enticing content may be occasionally inserted to lure an unauthorized user to reveal themselves by visiting a fake website and entering generated credentials.
  • the system may also be converted to a daemon that runs in the background and automatically detects and processes new users, e-mails, or files as they appear.
  • FIG. 1 depicts a directory structure in accordance with an embodiment of the invention
  • FIG. 2 depicts a screenshot in accordance with an embodiment of the invention
  • FIG. 3 depicts a screenshot in accordance with an embodiment of the invention
  • FIG. 4 depicts a flowchart for processing an e-mail or file according to an embodiment of the invention
  • FIG. 5 depicts an example of a .pshift file according to an embodiment of the invention
  • FIG. 6 depicts an e-mail generated in accordance with an embodiment of the invention
  • FIG. 7 depicts a screenshot in accordance with an embodiment of the invention.
  • FIG. 8 depicts a screenshot in accordance with an embodiment of the invention.
  • FIG. 9 depicts an original email and the email modified in accordance with an embodiment of the invention.
  • FIG. 10 depicts a screenshot in accordance with an embodiment of the invention.
  • FIG. 11 depicts an excerpt from a .docx file and the excerpt modified in accordance with an embodiment of the invention.
  • FIGS. 12A and 12B depict an excerpt from a .pdf file ( FIG. 12A ) and the excerpt modified in accordance with an embodiment of the invention ( FIG. 12B ).
  • exemplary means illustrative or by way of example, and any reference herein to “the invention” is not intended to restrict or limit the invention to the exact features or steps of any one or more of the exemplary embodiments disclosed in the present specification. Also, repeated use of the phrase “in one embodiment,” “in an exemplary embodiment,” or similar phrases do not necessarily refer to the same embodiment, although they may. It is also noted that terms like “preferably,” “commonly,” and “typically,” are not used herein to limit the scope of the claimed invention or to imply that certain features are critical, essential, or even important to the structure or function of the claimed invention. Rather, those terms are merely intended to high-light alternative or additional features that may or may not be used in a particular embodiment of the present invention.
  • the Decoy Generating System (“DGS” or “RAGS”) of the present invention processes e-mails along with their attachments and other user files in one user directory at a time.
  • the user directories may exist within a specified base directory that is provided to the system. Each user directory may contain a subdirectory called “Email” containing e-mails and (as separate files) attachments; another subdirectory called “Files” may contain other user files. Processed (i.e., shifted) e-mails may be placed in a subdirectory called “Changed Emails,” and processed files may be placed in a subdirectory called “Changed_Files.” This is summarized in FIG. 1 .
  • the DGS system may be implemented in Python and may make extensive use of the Natural Language Toolkit (NLTK), a popular platform for building Python applications that process natural language.
  • NLTK Natural Language Toolkit
  • All e-mails may be stored in .json format.
  • the DGS system may detect dates and times, and shift them according to deltas specified by the user.
  • the current state of the system may also detect people and shift them according to templates specified by the user.
  • Other areas of investigation include the detection of other nouns including locations and organizations.
  • the system may process one user directory at a time, and within each user directory, the system may process e-mails and files in sorted order according to their timestamps.
  • the system may randomly insert a fake e-mail including enticing content.
  • the content may include the URL of a fake website and login credentials for the website.
  • the user may be alerted, for example via e-mail, if anyone tries to log in to the fake website using credentials associated with the user's account.
  • the system may also run as a daemon that can detect when new user directories, e-mails, or files are added or created, and process them automatically at such times.
  • the system may automatically detect concepts including dates, times, people, and locations in e-mails and files, using an approach known as named entity recognition.
  • a “chunker” predicts a category for every token (i.e., word) in a document using a Bayesian machine learning algorithm.
  • Each token may begin a concept (e.g., label B-PERSON), continue a concept (e.g., label I-PERSON), or not be part of any recognized concept (label O).
  • Label O General features used for learning include the token itself, the token's part-of-speech, the next and previous token and part-of-speech (POS), and the previous token's label.
  • a sample training corpus may consists of 94 news documents from the publicly available Information Extraction: Entity Recognition (LEER) corpus, and 100 randomly selected e-mails from the Enron e-mail dataset.
  • Cross-validation experiments may be performed within the training set to evaluate the chunker's accuracy detecting dates, times, and people using standard metrics from the field of natural language processing (NLP).
  • the metrics used may include recall, which indicates the percentage of actual tokens from the category that are correctly predicted to belong to the category; precision, which indicates the percentage of predicted tokens assigned to the category that actually do belong to the category; and F1, which combines recall and precision into a single metric that is closer to the lower of the two.
  • recall which indicates the percentage of actual tokens from the category that are correctly predicted to belong to the category
  • precision which indicates the percentage of predicted tokens assigned to the category that actually do belong to the category
  • F1 which combines recall and precision into a single metric that is closer to the lower of the two.
  • a typical user should never need to retrain the chunker.
  • the system allows the user to train their own chunker, and to specify that chunker to be used by the system in place of a default chunker (which, for example, may have been trained using the training set and methodology indicated above).
  • a graphical user interface may be implemented and shall be referred to herein as the Named Entity Labeler.
  • the term “named entity” is used to represent the concepts that are detected by this sort of approach, including concepts such as dates, times, etc.
  • FIG. 2 A screenshot of our Named Entity Labeler being used to label one of the e-mails in a training set is shown in FIG. 2 .
  • the user can select the type of chunk from a “Chunk” menu, and then select portions of text that match that chunk.
  • the user may click the left button of a computer mouse to label a selection as an example of the chunk, and the graphical user interface (GUI) may automatically expand the selection to make sure it includes complete tokens.
  • the user may click the right button of a computer mouse to select a previously labeled chunk to delete the label.
  • the file may be saved in two formats.
  • One format may have the extension .nel, and comprise a text file with metadata that the GUI uses to indicate current labels.
  • Another format may be a .train file, which has the appropriate format for training the chunker.
  • FIG. 3 shows a screenshot of part of a computer screen indicating how the chunker may be trained.
  • the first parameter indicates the name to give a pickled chunker, wherein “pickling” is how Python applications typically implement object serialization.
  • the second parameter is a text file that contains the full paths and filenames of documents to be used for training.
  • the applied function disclosed in FIG. 3 “create date chunker,” may be used to recognize dates. In the alternative, the function may create a chunker that is capable of predicting all concepts labeled in the training files. On a typical machine, the entire training process may take only a few seconds. Additional Python scripts may be implemented to perform cross-validation experiments within the training set to automatically compute and display evaluation metrics for all concepts (not shown here).
  • the DGS system When the DGS system processes e-mails, attachments, or other files, it may first extract the textual content from the document, then segment the text into sentences, then tokenizes each sentence (i.e., split the sentence into words plus important punctuation), then compute the part-of-speech (i.e., syntactic category) for each token, then compute other features used for learning, then apply the chunker to detect recognized concepts (e.g., dates, times, names of people, locations). For each predicted concept, a second phase may be applied to eliminate false positives. Then each date and time may be shifted according to deltas specified by the user (this makes use of Python's datetime module).
  • Matching .pshift files provided by the user may also be modified according to user-provided templates as explained below. After all shifts are applied, the document may be reconstructed and saved in the proper destination folder.
  • a simplified outline explaining the system workflow for processing a single e-mail or file is shown in FIG. 4 .
  • Retrieving the text from a file, represented by the first box in the outline, may be more complicated for some file types than others.
  • the Python json module can be used to obtain and potentially modify the various fields.
  • Text files are also simple to deal with.
  • the system may handle HTML-formatted e-mails (and other .html files, if any), .docx attachments and files, and .pdf attachments and files. Handling HTML and .docx files are similar, because .docx files are stored as compressed XML documents, and specific tags indicate textual fields; Python's lxml module is useful for handling both formats. Complications can still arise as sentences may be split between HTML or XML nodes.
  • the system may restore all modified tokens to their original nodes to preserve formatting. It is difficult, however, to manipulate .pdf files directly.
  • the system may therefore rely on publically available utilities to convert .pdf files to .html, process the .html, and convert the file back to pdf.
  • the conversion is not perfect, so formatting of .pdf files is only approximately preserved. Any other file type, either as an attachment or standalone file, is copied to the destination directory unmodified.
  • Shifting dates and times, once predicted and verified, may be achieved using Python's datetime module (examples are described below).
  • Python's datetime module examples are described below.
  • FIG. 5 An example of a .pshift file specifying rules for shifting variations of the name Ken Lay is shown in FIG. 5 .
  • a detected name predicted by the chunker must match at least the first and last name as specified in the template.
  • Matches for the middle name are allowed but not required (but a middle name that is present and does not match the template would exclude the match).
  • each part of the name would be shifted as indicated in the template.
  • Templates may be case insensitive and flexible with respect to whitespace. The system may attempt to use the same style of capitalization for shifted tokens as for original tokens. Therefore, in the example embodiment shown in FIG. 5 , “Ken Lay” would become “John Public”, “KENNETH L.
  • LAY would become “JOHNNY Q. PUBLIC”, “Lay, Kenneth Lee” would become “Public, Johnny Quin”, etc.
  • Names such as “Ken” or “Lay” by themselves would not be shifted, since they do not match all required fields according to the first row of the template. Also, a name such as “Ken A. Lay” would not be shifted, since the middle initial “A” does not match the allowable middle names in the template, even though that field is not required.
  • believable fake e-mails are generated and inserted into a user's destination e-mail directory.
  • the system may be limited to at most one fake e-mail generated per user.
  • the content of the fake e-mails is based on configurable templates, and each template is applied at most once during a single run of the DGS system.
  • Each generated fake e-mail may contain fake credentials.
  • the fake e-mails are designed to entice a hacker who steals data into using the fake credentials at a fake website. Victims are automatically notified via e-mail when fake credentials have been used, indicating that their data has been stolen.
  • FIG. 6 shows the content of one fake e-mail generated.
  • the general format of the file is son, matching the format of original e-mails as specified by BAE.
  • Each template may be different.
  • the “From” field has been taken from a real e-mail of the same user; the “To” field contains a fake Gmail address based on the user's username; the body is mostly fixed, except for the username and password.
  • the values of the “Cc” and “Bcc” fields may be predetermined as null; the value of the “HasAttachments” field may be predetermined as false; the value of the “Id” field is a randomly modified version of an id from a real e-mail; and the values of the “DateSent” and “TimeSent” fields may be computed as random offsets from the corresponding fields from a real e-mail of the same user (after the real e-mail has been shifted).
  • the user may be required to specify the base directory within which all user directories reside. Additionally, the user may specify various optional parameters. If the user specifies a command with an incorrect format, a message may be displayed, such as the example screenshot depicted in FIG. 7 .
  • the “-c” option may enable the user to train and apply their own chunker (instead of a pre-trained chunker) as explained earlier.
  • the other options could allow the user to specify deltas for shifting times and dates, to specify one or more .pshift files for shifting detected people, and to specify the name of the log file that is produced while the system is running.
  • the system has been tested on a corpus consisting of: (1) a subset of the Enron E-mail Dataset including 8,419 e-mails from 20 users, all converted to the proper .json format; (2) 215 .docx and .pdf files from the MITRE corpus; these MITRE files have been randomly scattered across user file folders and randomly added as attachments to e-mails; (3) 118 .txt files, representing the MITRE .pdf files converted to text (these are Unicode text files), plus one additional manually created ASCII .txt file; these .txt files were randomly scattered across user file folders (but these are not used as attachments); and (4) one additional complex .json file, including complex, formatted attachments and a .json field with an HTML-formatted body.
  • test corpus/enron_plus_mitre Assuming that the test corpus is placed in the directory “corpus/enron_plus_mitre” relative to the main system, a test run using all of the provided .pshift files, with specified deltas of ⁇ 500 days and +630 minutes, can be run as follows: python batch_process son.py corpus/enron_plus_mitre-d-500-m 630-1 log l.txt-p KenLay.pshift-p DougGilbert-smith.pshift-p NatalieMcCarthy.pshift-p WandaCuny.pshift-p CarlReiber.pshift
  • FIG. 8 shows a sample screenshot part way through one test run of the system.
  • the user may be updated after every 100 e-mails and non-attachment files have been processed.
  • the log file (not shown here), which can be examined as the system is running or afterward, may contain much more detailed information.
  • the system may terminate after all e-mails and files have been processed.
  • the a daemon may be run in the background and automatically process new users, e-mails, or files whenever they appear.
  • the user may also configure many different aspects of the system through a configuration file.
  • These configurable properties may include: (1) the default name of the log file; (2) the names of the subdirectories for original and modified files and e-mails in the corpus; (3) the expected fields in the .json files; (4) the probabilities determining how often fake e-mails are randomly generated; (5) the content of the templates for generating fake e-mails; (6) the range of random offsets from the base e-mails for timestamps of fake e-mails; (7) whether or not to delete original e-mails and files after the modified versions have been created; and (8) the user information for the user running the system, so they may be notified when an unauthorized user has been lured to a fake website.
  • these properties tend to be more technical properties that are not likely to change frequently between runs of the system.
  • FIG. 9 shows an example e-mail from the test corpus in its original .json format (left) and after being modified by the system using the example command shown earlier (right).
  • the body of the e-mail contained one date, which was detected and shifted, and one name that matched our example “KenLay.pshift” file (shown earlier), which was also detected and modified. Additionally, the “DateSent” and “TimeSent” fields of the e-mail were shifted.
  • the system may include a json_diff utility, written in Python and runnable from the command line, which displays the differences between two specified json files in a diff-like format.
  • FIG. 10 shows a screenshot displaying the output of the json_diff utility used to compare the original and modified .json files displayed in FIG. 9 .
  • FIG. 11 shows an excerpt from a .docx file from the test corpus in its original state (left) and after being processed by the sample command shown earlier (right).
  • This particular .docx file was one of the attachments for our complex .json file; the body contains the text of an e-mail from the Enron dataset, formatted in variety of ways. Two dates were detected and modified where they appear in the document.
  • the used .pshift file indicates that the name “Wanda Curry” should change to the name “Melanie Curtis”, and that only the first name is required for a match. The name was detected in two locations in this excerpt, but missed in a third location.
  • FIGS. 12A and 12B show an excerpt from a .pdf file in the test corpus in its original state ( FIG. 12A ) and after being processed by the sample command shown earlier ( FIG. 12B ).
  • This particular document came from the MITRE corpus. Examining closely, one can see that in this excerpt, there was just a single date (specifying a year), and it was correctly detected and shifted.
  • one of the .pshift files that we have been using for testing indicates that the name “Carl Reiber” should be changed to “Derek Hunt,” that only the last name is required, and the first name can be represented with just the initial “C.” In this excerpt, five instances of the name were detected and modified, and two instances were missed.
  • the token near the top of the document “Spotlight_Reiber_ver4 July28” may contain a name and date within a larger, single token; our system will not be able to identify concepts that occupy only a part of a larger token).
  • the modified document looks similar to the original and completely reasonable. As explained earlier, formatting of .pdf files is only approximately maintained. Note also that metadata about the document, such as the title that appears in the title bar, is preserved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Computer Hardware Design (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A system that generates decoy emails and documents by automatically detecting concepts such as dates, times, people, and locations in e-mails and documents, and shifting those concepts. The system may also generate an email or document reciting a URL associated with a fake website and purported login credentials for the fake website. The system may send an alert to a user of the system when someone seeks to access the fake website.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of U.S. patent application Ser. No. 15/233,563, filed on Aug. 10, 2016, which claims the benefit of U.S. Provisional Patent Application No. 62/202,997, filed on Aug. 10, 2015. The entire contents of these applications are incorporated herein by reference.
  • BACKGROUND
  • The systems and methods of the present invention may be used to detect and thwart hackers or other unauthorized users of computer systems.
  • SUMMARY OF INVENTION
  • The system may process e-mails in son format plus attachments and standalone files within directory structures matching certain specifications. E-mails and files may be processed one user at a time, in sorted order according to timestamps. Within each e-mail or file, dates and times may be detected and shifted according to user-specified deltas, and people's names are detected and shifted according to user-provided templates Formatting may be preserved exactly for .docx files and approximately for .pdf files. Text files and html-formatted e-mails may also be handled similarly. The accuracy achieved for detecting recognized concepts may be high, based on a Bayesian machine learning algorithm for named entity recognition followed by a second phase to exclude false positives. During processing, fake e-mails including enticing content may be occasionally inserted to lure an unauthorized user to reveal themselves by visiting a fake website and entering generated credentials. The system may also be converted to a daemon that runs in the background and automatically detects and processes new users, e-mails, or files as they appear.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • A further understanding of the invention can be obtained by reference to exemplary embodiments set forth in the illustrations of the accompanying drawings. Although the illustrated embodiments are merely exemplary of systems, methods, and apparatuses for carrying out the invention, both the organization and method of operation of the invention, in general, together with further objectives and advantages thereof, may be more easily understood by reference to the drawings and the following description. Like reference numbers generally refer to like features (e.g., functionally similar and/or structurally similar elements).
  • The drawings are not necessarily depicted to scale; in some instances, various aspects of the subject matter disclosed herein may be shown exaggerated or enlarged in the drawings to facilitate an understanding of different features. Also, the drawings are not intended to limit the scope of this invention, which is set forth with particularity in the claims as appended hereto or as subsequently amended, but merely to clarify and exemplify the invention.
  • FIG. 1 depicts a directory structure in accordance with an embodiment of the invention;
  • FIG. 2 depicts a screenshot in accordance with an embodiment of the invention;
  • FIG. 3 depicts a screenshot in accordance with an embodiment of the invention;
  • FIG. 4 depicts a flowchart for processing an e-mail or file according to an embodiment of the invention;
  • FIG. 5 depicts an example of a .pshift file according to an embodiment of the invention;
  • FIG. 6 depicts an e-mail generated in accordance with an embodiment of the invention;
  • FIG. 7 depicts a screenshot in accordance with an embodiment of the invention;
  • FIG. 8 depicts a screenshot in accordance with an embodiment of the invention;
  • FIG. 9 depicts an original email and the email modified in accordance with an embodiment of the invention;
  • FIG. 10 depicts a screenshot in accordance with an embodiment of the invention;
  • FIG. 11 depicts an excerpt from a .docx file and the excerpt modified in accordance with an embodiment of the invention; and
  • FIGS. 12A and 12B depict an excerpt from a .pdf file (FIG. 12A) and the excerpt modified in accordance with an embodiment of the invention (FIG. 12B).
  • DETAILED DESCRIPTION OF THE INVENTION
  • The invention may be understood more readily by reference to the following detailed descriptions of embodiments of the invention. However, techniques, systems, and operating structures in accordance with the invention may be embodied in a wide variety of forms and modes, some of which may be quite different from those in the disclosed embodiments. Also, the features and elements disclosed herein may be combined to form various combinations without exclusivity, unless expressly stated otherwise. Consequently, the specific structural and functional details disclosed herein are merely representative. Yet, in that regard, they are deemed to afford the best embodiments for purposes of disclosure and to provide a basis for the claims herein, which define the scope of the invention. It must be noted that, as used in the specification and the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the context clearly indicates otherwise.
  • Use of the term “exemplary” means illustrative or by way of example, and any reference herein to “the invention” is not intended to restrict or limit the invention to the exact features or steps of any one or more of the exemplary embodiments disclosed in the present specification. Also, repeated use of the phrase “in one embodiment,” “in an exemplary embodiment,” or similar phrases do not necessarily refer to the same embodiment, although they may. It is also noted that terms like “preferably,” “commonly,” and “typically,” are not used herein to limit the scope of the claimed invention or to imply that certain features are critical, essential, or even important to the structure or function of the claimed invention. Rather, those terms are merely intended to high-light alternative or additional features that may or may not be used in a particular embodiment of the present invention.
  • For exemplary methods or processes of the invention, the sequence and/or arrangement of steps described herein are illustrative and not restrictive. Accordingly, it should be understood that, although steps of various processes or methods may be shown and described as being in a sequence or temporal arrangement, the steps of any such processes or methods are not limited to being carried out in any particular sequence or arrangement, absent an indication otherwise. Indeed, the steps in such processes or methods generally may be carried out in various different sequences and arrangements while still falling within the scope of the present invention.
  • The Decoy Generating System (“DGS” or “RAGS”) of the present invention processes e-mails along with their attachments and other user files in one user directory at a time. The user directories may exist within a specified base directory that is provided to the system. Each user directory may contain a subdirectory called “Email” containing e-mails and (as separate files) attachments; another subdirectory called “Files” may contain other user files. Processed (i.e., shifted) e-mails may be placed in a subdirectory called “Changed Emails,” and processed files may be placed in a subdirectory called “Changed_Files.” This is summarized in FIG. 1. The DGS system may be implemented in Python and may make extensive use of the Natural Language Toolkit (NLTK), a popular platform for building Python applications that process natural language.
  • All e-mails may be stored in .json format. When the DGS system processes e-mails and files, it may detect dates and times, and shift them according to deltas specified by the user. The current state of the system may also detect people and shift them according to templates specified by the user. Other areas of investigation include the detection of other nouns including locations and organizations.
  • The system may process one user directory at a time, and within each user directory, the system may process e-mails and files in sorted order according to their timestamps. The system may randomly insert a fake e-mail including enticing content. The content may include the URL of a fake website and login credentials for the website. The user may be alerted, for example via e-mail, if anyone tries to log in to the fake website using credentials associated with the user's account. The system may also run as a daemon that can detect when new user directories, e-mails, or files are added or created, and process them automatically at such times.
  • A. Training the DGS System
  • The system may automatically detect concepts including dates, times, people, and locations in e-mails and files, using an approach known as named entity recognition. A “chunker” predicts a category for every token (i.e., word) in a document using a Bayesian machine learning algorithm. Each token may begin a concept (e.g., label B-PERSON), continue a concept (e.g., label I-PERSON), or not be part of any recognized concept (label O). General features used for learning include the token itself, the token's part-of-speech, the next and previous token and part-of-speech (POS), and the previous token's label. Several concept-specific features have been added to improve accuracy (e.g., Boolean features representing the inclusion, or not, in lists of months, lists of names according to the U.S. Census Bureau, etc.). A second phase using hand-crafted rules is applied to eliminate some false positives. For example, predicted dates are excluded if they are not verified by Python's dateutil module, and names of people are excluded if they contain ‘@’, since these are probably e-mail addresses.
  • The chunker is trained on files that have had instances of each relevant concept manually labeled. A sample training corpus may consists of 94 news documents from the publicly available Information Extraction: Entity Recognition (LEER) corpus, and 100 randomly selected e-mails from the Enron e-mail dataset. Cross-validation experiments may be performed within the training set to evaluate the chunker's accuracy detecting dates, times, and people using standard metrics from the field of natural language processing (NLP). The metrics used may include recall, which indicates the percentage of actual tokens from the category that are correctly predicted to belong to the category; precision, which indicates the percentage of predicted tokens assigned to the category that actually do belong to the category; and F1, which combines recall and precision into a single metric that is closer to the lower of the two. Based on cross-validation experiments, it is possible for the system to achieve F1 scores for dates averaging about 94%, F1 scores for times averaging about 91%, and F1 scores for people averaging about 70%.
  • A typical user should never need to retrain the chunker. However, the system allows the user to train their own chunker, and to specify that chunker to be used by the system in place of a default chunker (which, for example, may have been trained using the training set and methodology indicated above). A graphical user interface may be implemented and shall be referred to herein as the Named Entity Labeler. In the NLP literature, the term “named entity” is used to represent the concepts that are detected by this sort of approach, including concepts such as dates, times, etc.
  • A screenshot of our Named Entity Labeler being used to label one of the e-mails in a training set is shown in FIG. 2. The user can select the type of chunk from a “Chunk” menu, and then select portions of text that match that chunk. The user may click the left button of a computer mouse to label a selection as an example of the chunk, and the graphical user interface (GUI) may automatically expand the selection to make sure it includes complete tokens. The user may click the right button of a computer mouse to select a previously labeled chunk to delete the label. When the user saves the file, the file may be saved in two formats. One format may have the extension .nel, and comprise a text file with metadata that the GUI uses to indicate current labels. Another format may be a .train file, which has the appropriate format for training the chunker.
  • Once enough documents have been labeled to constitute a training set, a user can train a chunker using a Python script. This can easily be performed from the interactive Python shell. FIG. 3 shows a screenshot of part of a computer screen indicating how the chunker may be trained. The first parameter indicates the name to give a pickled chunker, wherein “pickling” is how Python applications typically implement object serialization. The second parameter is a text file that contains the full paths and filenames of documents to be used for training. The applied function disclosed in FIG. 3, “create date chunker,” may be used to recognize dates. In the alternative, the function may create a chunker that is capable of predicting all concepts labeled in the training files. On a typical machine, the entire training process may take only a few seconds. Additional Python scripts may be implemented to perform cross-validation experiments within the training set to automatically compute and display evaluation metrics for all concepts (not shown here).
  • B. Detecting and Shifting Concepts
  • When the DGS system processes e-mails, attachments, or other files, it may first extract the textual content from the document, then segment the text into sentences, then tokenizes each sentence (i.e., split the sentence into words plus important punctuation), then compute the part-of-speech (i.e., syntactic category) for each token, then compute other features used for learning, then apply the chunker to detect recognized concepts (e.g., dates, times, names of people, locations). For each predicted concept, a second phase may be applied to eliminate false positives. Then each date and time may be shifted according to deltas specified by the user (this makes use of Python's datetime module). Matching .pshift files provided by the user may also be modified according to user-provided templates as explained below. After all shifts are applied, the document may be reconstructed and saved in the proper destination folder. A simplified outline explaining the system workflow for processing a single e-mail or file is shown in FIG. 4.
  • Retrieving the text from a file, represented by the first box in the outline, may be more complicated for some file types than others. For e-mails represented as .json files, the Python json module can be used to obtain and potentially modify the various fields. Text files are also simple to deal with. The system may handle HTML-formatted e-mails (and other .html files, if any), .docx attachments and files, and .pdf attachments and files. Handling HTML and .docx files are similar, because .docx files are stored as compressed XML documents, and specific tags indicate textual fields; Python's lxml module is useful for handling both formats. Complications can still arise as sentences may be split between HTML or XML nodes. The system may restore all modified tokens to their original nodes to preserve formatting. It is difficult, however, to manipulate .pdf files directly. The system may therefore rely on publically available utilities to convert .pdf files to .html, process the .html, and convert the file back to pdf. The conversion is not perfect, so formatting of .pdf files is only approximately preserved. Any other file type, either as an attachment or standalone file, is copied to the destination directory unmodified.
  • Shifting dates and times, once predicted and verified, may be achieved using Python's datetime module (examples are described below). To specify names of people to shift, and how to shift them, the user can specify one or more templates in the form of .pshift files. Each template specifies a person to shift, if detected, and how to shift the person. Each template must include: (1) all allowable variations of the person's first name, middle name, and last name; (2) how each allowable variation of any part of a name should be modified; and (3) which parts of the person's name is required to count as a match.
  • An example of a .pshift file specifying rules for shifting variations of the name Ken Lay is shown in FIG. 5. Note that to be considered a match for this template, a detected name predicted by the chunker must match at least the first and last name as specified in the template. Matches for the middle name are allowed but not required (but a middle name that is present and does not match the template would exclude the match). When the template is matched, each part of the name would be shifted as indicated in the template. Templates may be case insensitive and flexible with respect to whitespace. The system may attempt to use the same style of capitalization for shifted tokens as for original tokens. Therefore, in the example embodiment shown in FIG. 5, “Ken Lay” would become “John Public”, “KENNETH L. LAY” would become “JOHNNY Q. PUBLIC”, “Lay, Kenneth Lee” would become “Public, Johnny Quin”, etc. Names such as “Ken” or “Lay” by themselves would not be shifted, since they do not match all required fields according to the first row of the template. Also, a name such as “Ken A. Lay” would not be shifted, since the middle initial “A” does not match the allowable middle names in the template, even though that field is not required.
  • C. Generating Fake E-Mails
  • At random points with configurable frequencies, believable fake e-mails are generated and inserted into a user's destination e-mail directory. The system may be limited to at most one fake e-mail generated per user. The content of the fake e-mails is based on configurable templates, and each template is applied at most once during a single run of the DGS system. Each generated fake e-mail may contain fake credentials. The fake e-mails are designed to entice a hacker who steals data into using the fake credentials at a fake website. Victims are automatically notified via e-mail when fake credentials have been used, indicating that their data has been stolen.
  • FIG. 6 shows the content of one fake e-mail generated. The general format of the file is son, matching the format of original e-mails as specified by BAE. Each template may be different. In the example shown in FIG. 6, the “From” field has been taken from a real e-mail of the same user; the “To” field contains a fake Gmail address based on the user's username; the body is mostly fixed, except for the username and password. The values of the “Cc” and “Bcc” fields may be predetermined as null; the value of the “HasAttachments” field may be predetermined as false; the value of the “Id” field is a randomly modified version of an id from a real e-mail; and the values of the “DateSent” and “TimeSent” fields may be computed as random offsets from the corresponding fields from a real e-mail of the same user (after the real e-mail has been shifted).
  • D. Running the DGS System
  • To run the system, the user may be required to specify the base directory within which all user directories reside. Additionally, the user may specify various optional parameters. If the user specifies a command with an incorrect format, a message may be displayed, such as the example screenshot depicted in FIG. 7. The “-c” option may enable the user to train and apply their own chunker (instead of a pre-trained chunker) as explained earlier. The other options could allow the user to specify deltas for shifting times and dates, to specify one or more .pshift files for shifting detected people, and to specify the name of the log file that is produced while the system is running.
  • The system has been tested on a corpus consisting of: (1) a subset of the Enron E-mail Dataset including 8,419 e-mails from 20 users, all converted to the proper .json format; (2) 215 .docx and .pdf files from the MITRE corpus; these MITRE files have been randomly scattered across user file folders and randomly added as attachments to e-mails; (3) 118 .txt files, representing the MITRE .pdf files converted to text (these are Unicode text files), plus one additional manually created ASCII .txt file; these .txt files were randomly scattered across user file folders (but these are not used as attachments); and (4) one additional complex .json file, including complex, formatted attachments and a .json field with an HTML-formatted body. Also included were five .pshift files. Assuming that the test corpus is placed in the directory “corpus/enron_plus_mitre” relative to the main system, a test run using all of the provided .pshift files, with specified deltas of −500 days and +630 minutes, can be run as follows: python batch_process son.py corpus/enron_plus_mitre-d-500-m 630-1 log l.txt-p KenLay.pshift-p DougGilbert-smith.pshift-p NatalieMcCarthy.pshift-p WandaCuny.pshift-p CarlReiber.pshift
  • FIG. 8 shows a sample screenshot part way through one test run of the system. The user may be updated after every 100 e-mails and non-attachment files have been processed. The log file (not shown here), which can be examined as the system is running or afterward, may contain much more detailed information. The system may terminate after all e-mails and files have been processed. In the alternative, the a daemon may be run in the background and automatically process new users, e-mails, or files whenever they appear.
  • In addition to the required and optional command line arguments, the user may also configure many different aspects of the system through a configuration file. These configurable properties may include: (1) the default name of the log file; (2) the names of the subdirectories for original and modified files and e-mails in the corpus; (3) the expected fields in the .json files; (4) the probabilities determining how often fake e-mails are randomly generated; (5) the content of the templates for generating fake e-mails; (6) the range of random offsets from the base e-mails for timestamps of fake e-mails; (7) whether or not to delete original e-mails and files after the modified versions have been created; and (8) the user information for the user running the system, so they may be notified when an unauthorized user has been lured to a fake website. In general, these properties tend to be more technical properties that are not likely to change frequently between runs of the system.
  • E. Examining System Output
  • FIG. 9 shows an example e-mail from the test corpus in its original .json format (left) and after being modified by the system using the example command shown earlier (right). Note that in this particular case, the body of the e-mail contained one date, which was detected and shifted, and one name that matched our example “KenLay.pshift” file (shown earlier), which was also detected and modified. Additionally, the “DateSent” and “TimeSent” fields of the e-mail were shifted.
  • The system may include a json_diff utility, written in Python and runnable from the command line, which displays the differences between two specified json files in a diff-like format. FIG. 10 shows a screenshot displaying the output of the json_diff utility used to compare the original and modified .json files displayed in FIG. 9.
  • To compare modified .docx files or .pdf files with the corresponding originals, the user may need to open both files and compare them by eye. Of course, for these file types, we are interested not only in the content that has changed, but also in ensuring that the formatting has stayed the same, or has changed in an acceptable manner.
  • FIG. 11 shows an excerpt from a .docx file from the test corpus in its original state (left) and after being processed by the sample command shown earlier (right). This particular .docx file was one of the attachments for our complex .json file; the body contains the text of an e-mail from the Enron dataset, formatted in variety of ways. Two dates were detected and modified where they appear in the document. In addition, the used .pshift file indicates that the name “Wanda Curry” should change to the name “Melanie Curtis”, and that only the first name is required for a match. The name was detected in two locations in this excerpt, but missed in a third location.
  • FIGS. 12A and 12B show an excerpt from a .pdf file in the test corpus in its original state (FIG. 12A) and after being processed by the sample command shown earlier (FIG. 12B). This particular document came from the MITRE corpus. Examining closely, one can see that in this excerpt, there was just a single date (specifying a year), and it was correctly detected and shifted. In addition, one of the .pshift files that we have been using for testing indicates that the name “Carl Reiber” should be changed to “Derek Hunt,” that only the last name is required, and the first name can be represented with just the initial “C.” In this excerpt, five instances of the name were detected and modified, and two instances were missed. The token near the top of the document, “Spotlight_Reiber_ver4 July28” may contain a name and date within a larger, single token; our system will not be able to identify concepts that occupy only a part of a larger token). The modified document looks similar to the original and completely reasonable. As explained earlier, formatting of .pdf files is only approximately maintained. Note also that metadata about the document, such as the title that appears in the title bar, is preserved.
  • Various other modifications will be obvious to a person of skill in the art without deviating from the inventions claimed herein.

Claims (15)

1. (canceled)
2. (canceled)
3. A method for detecting an unauthorized access comprising:
providing fake login credentials;
providing a website that presents one or more fields in which login credentials may be entered;
receiving the fake login credential in the one or more fields;
generating an alert upon receiving the fake login credentials;
transmitting the alert.
4. The method of claim 3 wherein the alert is transmitted by email.
5. The method of claim 4 wherein the alert indicates that the login credentials have been received in the one or more fields.
6. The method of claim 5 wherein the website is a fake website.
7. The method of claim 3 wherein the alert indicates that the login credentials have been received in the one or more fields.
8. The method of claim 3 wherein the website is a fake website.
9. A system for detecting an unauthorized access comprising:
a server having a computer readable storage medium;
machine readable code stored on said computer readable storage medium;
fake login credentials stored on said computer readable storage medium;
wherein said machine readable code includes instructions for rendering a website that presents one or more fields in which login credentials may be entered; and
wherein said machine readable code includes instructions capable of generating an alert and transmitting the alert upon receipt of the fake login credentials in the one or more fields.
10. The system of claim 9 wherein the alert is transmitted by email.
11. The system of claim 10 wherein the alert indicates that the login credentials have been entered in the one or more fields.
12. The system of claim 11 wherein the website is a fake website.
13. The system of claim 9 wherein the alert indicates that the login credentials have been entered in the one or more fields.
14. The system of claim 9 wherein the website is a fake website.
15. The system of claim 9, wherein the alert is transmitted to a device that is remote from the system.
US16/680,873 2015-08-10 2019-11-12 Generating highly realistic decoy email and documents Abandoned US20200084238A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/680,873 US20200084238A1 (en) 2015-08-10 2019-11-12 Generating highly realistic decoy email and documents

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201562202997P 2015-08-10 2015-08-10
US15/233,563 US10476908B2 (en) 2015-08-10 2016-08-10 Generating highly realistic decoy email and documents
US16/680,873 US20200084238A1 (en) 2015-08-10 2019-11-12 Generating highly realistic decoy email and documents

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US15/233,563 Continuation US10476908B2 (en) 2015-08-10 2016-08-10 Generating highly realistic decoy email and documents

Publications (1)

Publication Number Publication Date
US20200084238A1 true US20200084238A1 (en) 2020-03-12

Family

ID=58500184

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/233,563 Expired - Fee Related US10476908B2 (en) 2015-08-10 2016-08-10 Generating highly realistic decoy email and documents
US16/680,873 Abandoned US20200084238A1 (en) 2015-08-10 2019-11-12 Generating highly realistic decoy email and documents

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US15/233,563 Expired - Fee Related US10476908B2 (en) 2015-08-10 2016-08-10 Generating highly realistic decoy email and documents

Country Status (1)

Country Link
US (2) US10476908B2 (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10373711B2 (en) 2014-06-04 2019-08-06 Nuance Communications, Inc. Medical coding system with CDI clarification request notification
US10754925B2 (en) 2014-06-04 2020-08-25 Nuance Communications, Inc. NLU training with user corrections to engine annotations
US10366687B2 (en) 2015-12-10 2019-07-30 Nuance Communications, Inc. System and methods for adapting neural network acoustic models
CN105957516B (en) * 2016-06-16 2019-03-08 百度在线网络技术(北京)有限公司 More voice identification model switching method and device
EP3516560A1 (en) 2016-09-20 2019-07-31 Nuance Communications, Inc. Method and system for sequencing medical billing codes
US11483345B2 (en) * 2016-12-08 2022-10-25 Cequence Security, Inc. Prevention of malicious automation attacks on a web service
US11568148B1 (en) 2017-02-17 2023-01-31 Narrative Science Inc. Applied artificial intelligence technology for narrative generation based on explanation communication goals
US11133091B2 (en) 2017-07-21 2021-09-28 Nuance Communications, Inc. Automated analysis system and method
CN107491439B (en) * 2017-09-07 2020-05-19 成都信息工程大学 Medical ancient Chinese sentence segmentation method based on Bayesian statistical learning
US11024424B2 (en) * 2017-10-27 2021-06-01 Nuance Communications, Inc. Computer assisted coding systems and methods
US11023689B1 (en) 2018-01-17 2021-06-01 Narrative Science Inc. Applied artificial intelligence technology for narrative generation using an invocable analysis service with analysis libraries
US11182556B1 (en) 2018-02-19 2021-11-23 Narrative Science Inc. Applied artificial intelligence technology for building a knowledge base using natural language processing
US11232270B1 (en) * 2018-06-28 2022-01-25 Narrative Science Inc. Applied artificial intelligence technology for using natural language processing to train a natural language generation system with respect to numeric style features
US10757137B1 (en) * 2018-09-26 2020-08-25 NortonLifeLock Inc. Thwarting an impersonation attack using online decoy text
AU2018247212B2 (en) 2018-10-09 2025-06-05 Penten Pty Ltd Methods and systems for honeyfile creation, deployment and management
US12368755B2 (en) 2018-10-09 2025-07-22 Penten Pty Ltd Methods and systems for honeyfile creation, deployment, and management
US11341330B1 (en) 2019-01-28 2022-05-24 Narrative Science Inc. Applied artificial intelligence technology for adaptive natural language understanding with term discovery
US11636282B2 (en) * 2019-06-28 2023-04-25 International Business Machines Corporation Machine learned historically accurate temporal classification of objects
CN110502896B (en) * 2019-08-28 2021-07-27 杭州安恒信息技术股份有限公司 A kind of website information leakage monitoring method, system and related device
WO2021134432A1 (en) * 2019-12-31 2021-07-08 Paypal, Inc. Framework for managing natural language processing tools
US11552982B2 (en) * 2020-08-24 2023-01-10 KnowBe4, Inc. Systems and methods for effective delivery of simulated phishing campaigns
US11444902B2 (en) * 2020-10-16 2022-09-13 Microsoft Technology Licensing, Llc Surfacing media conversations and interactive functionality within a message viewer of a messaging system
US11223652B1 (en) * 2021-01-27 2022-01-11 BlackCloak, Inc. Deception system
US12028376B2 (en) 2021-02-06 2024-07-02 Penten Pty Ltd Systems and methods for creation, management, and storage of honeyrecords
US11695723B2 (en) 2021-10-29 2023-07-04 Microsoft Technology Licensing, Llc Creation and consumption of non-electronic mail (email) social media content from within an email system
US12093301B1 (en) * 2023-10-03 2024-09-17 Akoya LLC Systems and methods for modifying JSON files

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1481346B1 (en) * 2002-02-04 2012-10-10 Cataphora, Inc. A method and apparatus to visually present discussions for data mining purposes
US7152242B2 (en) * 2002-09-11 2006-12-19 Enterasys Networks, Inc. Modular system for detecting, filtering and providing notice about attack events associated with network security
CA2418255A1 (en) * 2003-01-31 2004-07-31 Ibm Canada Limited - Ibm Canada Limitee Tracking and maintaining related and derivative code
US7954151B1 (en) * 2003-10-28 2011-05-31 Emc Corporation Partial document content matching using sectional analysis
US7996511B1 (en) * 2003-10-28 2011-08-09 Emc Corporation Enterprise-scalable scanning using grid-based architecture with remote agents
WO2006113722A2 (en) * 2005-04-18 2006-10-26 The Regents Of The University Of California High-performance context-free parser for polymorphic malware detection
US20120254333A1 (en) * 2010-01-07 2012-10-04 Rajarathnam Chandramouli Automated detection of deception in short and multilingual electronic messages
US8918311B1 (en) * 2012-03-21 2014-12-23 3Play Media, Inc. Intelligent caption systems and methods
US9734826B2 (en) * 2015-03-11 2017-08-15 Microsoft Technology Licensing, Llc Token-level interpolation for class-based language models

Also Published As

Publication number Publication date
US10476908B2 (en) 2019-11-12
US20170104785A1 (en) 2017-04-13

Similar Documents

Publication Publication Date Title
US20200084238A1 (en) Generating highly realistic decoy email and documents
JP6077472B2 (en) User interface and workflow for machine learning
US7930299B2 (en) System and method for appending security information to search engine results
US8407781B2 (en) Information providing support device and information providing support method
US20110029491A1 (en) Dynamically detecting near-duplicate documents
US7451391B1 (en) Method for web page rules compliance testing
US8140494B2 (en) Providing collection transparency information to an end user to achieve a guaranteed quality document search and production in electronic data discovery
US20060064394A1 (en) Method for handling changing and disappearing online references to research information
US9697287B2 (en) Detection and handling of aggregated online content using decision criteria to compare similar or identical content items
JP2007305135A (en) Method and program for showing electronic communication document with copy of electronic communication document stored to person concerned, and method, system and device for showing at least one of person concerned and contributor that electronic communication document is stored
CN102725729A (en) Analyzing objects from a graphical interface for standards verification
US9665543B2 (en) System and method for reference validation in word processor documents
US9361933B2 (en) Reducing errors in sending file attachments
US11651607B2 (en) Information processing apparatus and non-transitory computer readable medium storing program
JP2021089667A (en) Information processing apparatus and program
US12045719B2 (en) Identifying portions of electronic communication documents using machine vision
Kong et al. Proximity-based traceability: An empirical validation using ranked retrieval and set-based measures
CN113138974A (en) Database compliance detection method and device
Balon et al. Forensic artifact finder (forensicaf): an approach & tool for leveraging crowd-sourced curated forensic artifacts
US9705837B2 (en) Method, computer program and computer for detecting trends in social media
KR101174398B1 (en) Apparatus and method for recommanding contents
US11516166B1 (en) Header recognition techniques for an email threading tool
US20060271597A1 (en) Code-enabled/code-free files
US20210382948A1 (en) Document risk analysis
CN117910043B (en) Method, system and device for deep mining of hidden information in electronic documents

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION