[go: up one dir, main page]

CN116451186A - Sensitive data security protection method and system - Google Patents

Sensitive data security protection method and system Download PDF

Info

Publication number
CN116451186A
CN116451186A CN202310431940.5A CN202310431940A CN116451186A CN 116451186 A CN116451186 A CN 116451186A CN 202310431940 A CN202310431940 A CN 202310431940A CN 116451186 A CN116451186 A CN 116451186A
Authority
CN
China
Prior art keywords
content
marked
sensitive data
cut
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310431940.5A
Other languages
Chinese (zh)
Other versions
CN116451186B (en
Inventor
徐浩
罗剑芳
罗维佳
吴勇
丁卓
朱凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Zhangdong Intelligent Technology Co ltd
Original Assignee
Guangzhou Zhangdong Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Zhangdong Intelligent Technology Co ltd filed Critical Guangzhou Zhangdong Intelligent Technology Co ltd
Priority to CN202310431940.5A priority Critical patent/CN116451186B/en
Publication of CN116451186A publication Critical patent/CN116451186A/en
Application granted granted Critical
Publication of CN116451186B publication Critical patent/CN116451186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/543User-generated data transfer, e.g. clipboards, dynamic data exchange [DDE], object linking and embedding [OLE]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Technology Law (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Storage Device Security (AREA)

Abstract

The application discloses a sensitive data security protection method and system, the method includes: monitoring a browser and a copying and cutting board, and when the content copied or cut by the copying and cutting board is monitored to be sourced from a preset IP address or domain name of the browser, marking the content in the copying and cutting board for the first time; when the marked copy cut panel content is pasted to the code editing tool, the content pasted this time is marked for the second time in the code editing tool. The application can prevent safety risks.

Description

Sensitive data security protection method and system
Technical Field
The application relates to software technology, in particular to a sensitive data security protection method and system.
Background
With the development of AI technology, writing codes by using AI has become a way of developing software, however, there is a certain potential safety hazard in such a way, and the potential safety hazard includes that a part of codes generated by AI may have defects or loopholes, and on the other hand, when data are processed by AI, the AI model may learn some sensitive data, so that a safety problem occurs.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a sensitive data security protection method and a sensitive data security protection system, which are used for preventing sensitive data from being leaked and preventing code risks.
In one aspect, an embodiment of the present application provides a method for protecting sensitive data security, including:
monitoring a browser and a copying and cutting board, and when the content copied or cut by the copying and cutting board is monitored to be sourced from a preset IP address or domain name of the browser, marking the content in the copying and cutting board for the first time;
when the marked copy cut panel content is pasted to the code editing tool, the content pasted this time is marked for the second time in the code editing tool.
In some embodiments, the method further comprises the steps of:
disabling pasting of copy-cut-sheet content to the browser when it is monitored that the content copied or cut by the copy-cut-sheet originates from a local file marked as sensitive data;
the sensitive data includes user information and a code.
In some embodiments, the method further comprises the steps of:
when the content marked by the first mark is pasted to the local file, the third mark is carried out on the local file;
when the content in the file marked by the third mark is pasted to the code editing tool, the second mark is carried out on the pasted content in the code editing tool.
In some embodiments, the method further comprises the steps of:
and when part or all of the content of the file marked by the third mark is copied and pasted to a second local file, marking the second local file by the third mark.
In some embodiments, the second marking is performed, in particular:
marking the marked code segment in a highlighting or thickening mode;
wherein all characters are marked independently.
In some embodiments, the marked code segments are configured to be visible to a user of the preset authority.
In some embodiments, the method further comprises the steps of:
this is recorded when it is monitored that the content copied or cut by the copy cut-out board originates from a local file marked as sensitive data.
In some embodiments, the path of the marked file is written into the mark-up document when the third marking is performed.
In some embodiments, the preconditions for first marking the content in the copy shear plate include:
the composition of the content in the copy cut is detected, and when the proportion of the specific punctuation contained in the copy cut is greater than a threshold value, it is determined that the content in the copy cut is a code.
In another aspect, an embodiment of the present application provides a sensitive data security protection system, including:
a memory for storing a program;
and the processor is used for loading the program to execute the sensitive data security protection method.
According to the embodiment of the application, through monitoring the browser and the copying and cutting board, when the content copied or cut by the copying and cutting board is monitored to be sourced from a preset IP address or domain name of the browser, the content in the copying and cutting board is marked for the first time; when the marked copy cut-board content is pasted to the code editing tool, performing second marking on the pasted content in the code editing tool; in this way, codes derived from AI generation can be marked, reducing risks associated with entering the software architecture, and can be discovered in time when developers copy the codes into the code system, facilitating internal inspection of code quality and assessment of security risks.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the description of the embodiments will be briefly described.
FIG. 1 is a flow chart of a method for secure protection of sensitive data according to an embodiment of the present application;
fig. 2 is a schematic diagram of a marking process provided in an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present application more clear, the technical solutions of the present application will be clearly and completely described by implementation with reference to the accompanying drawings in the examples of the present application, and it is apparent that the described examples are some, but not all, examples of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Large Language Models (LLMs) refer to deep learning models trained using large amounts of text data that can generate natural language text or understand the meaning of language text. The large language model can process various natural language tasks, such as text classification, question-answering, dialogue and the like, and is an important path to artificial intelligence. As the scale of these large language models grows, their parameter magnitudes have evolved from tens of millions to billions of such tools have been able to generate code with specific functionality according to the instructions of the user. However, since the training data of the model is not controlled as the model is used, it is uncertain whether the solution generated by the model has risks. Especially when the solution is sufficiently complex, it is uncertain whether or not there is a vulnerability.
With the development of technology, these large language models are more intelligent, and not only can reply to the content of the user, but also can further execute instructions. While most AI models support users to deliver data to them for their learning. This situation, if the user uses improperly, may be learned by the AI model and even steal some of the data content, creating a significant security risk.
Referring to fig. 1, an embodiment of the present application discloses a sensitive data security protection method, which is mainly applied in development environments of developers, including personal computers, servers and the like of the developers, and mainly focuses on preventing risks generated by using an AI model, and may be embedded into some current security management systems as a detection branch for the AI model. The method comprises the following steps:
s1, monitoring a browser and a copying and cutting board, and when the content copied or cut by the copying and cutting board is monitored to be sourced from a preset IP address or domain name of the browser, marking the content in the copying and cutting board for the first time. It will be appreciated that by utilizing a program that runs as the system is started, the browser and replica clipboard can be monitored in a background fashion, as the user from a particular web site (as determined by IP and web site domain name), the replica clipboard is monitored. When a user copies content at a particular website, the content is marked. The method mainly aims at monitoring whether a user generates codes through an AI generated website, and the codes may have the problems of poor quality, loopholes, other risks and the like. These problems will affect the quality and safety issues of the software engineering. The software is easy to leak, and particularly in the case that the AI can write more complex codes, the problem cannot be easily found by a manual inspection mode. The purpose of this step is to monitor this replication behaviour.
In the present embodiment, the main monitoring object is code. Thus, the content to be marked can be further subdivided. If it is determined that the software code is on the copy-on-board, it may be further monitored, and if it is determined that it is not the code, it may be left untagged, i.e. not being monitored.
Then a determination is made as to whether the code in the copy-and-paste board can be made by: wherein it is preferable to determine that the contents in the copy cut are codes when the proportion of the specific mark contained in the copy cut is greater than a threshold value by monitoring the composition of the contents in the copy cut. The principle is that the code is typically constituted by english, or english symbols and punctuation marks of a particular programming language. For example, in general conventional sentences, there are relatively few brackets, semicolons, and the like. And in normal sentences, the proportion of punctuation marks to the total number of characters is relatively low. Therefore, by detecting the ratio of the number of punctuation coincidences that meet the common code grammar in the entire content, it is possible to recognize whether or not the above content is code more accurately and simply.
Of course, alternatively, the pasted content may be encoded, and input into a training model (such as a language model, an SVM model, etc.) for classification, thereby determining whether the content is encoded. Of course, this approach is relatively costly.
S2, when the marked copy cut panel content is pasted to the code editing tool, second marking is carried out on the pasted content in the code editing tool. It will be appreciated that when the content in the copy-and-paste board is marked, a notification is sent to the listener to the code editing tool when it is detected that the content in the copy-and-paste board is pasted to the code editing tool, and the marking plug-in the code editing tool marks the code copied to the code editing tool. Each character is marked. The marking mode can be visible to specific authorities or all people. The marking mode can adopt a highlighting mode, a thickening mode and the like.
S3, when the content copied or cut by the copy cut board is monitored to be sourced from a local file marked as sensitive data, the copy cut board content is forbidden to be pasted to the browser;
the sensitive data includes user information and a code.
It will be appreciated that the local code file, the file related to user privacy, may be marked as a sensitive file, at which time a list of sensitive files is loaded at the start of the listener. The listener listens to each foreground program (i.e., the program currently operated by the user), and if the current program operation object is sensitive data, monitors the user's copy behavior. If this data is copied to the browser for transmission to the AI model, leakage may occur. Of course, compared to the prior art, the main focus of the present application is on the prevention of AI models. It includes the use of AI to generate codes that may have adverse effects on the incorporation of software engineering, as well as the protection against code being infused into a third party AI model, creating security concerns such as data leakage.
In some embodiments, the method further comprises the steps of:
and S4, when the content marked by the first mark is pasted to the local file, performing third mark on the local file.
And S5, when the content in the file marked by the third mark is pasted to the code editing tool, the second mark is carried out on the pasted content in the code editing tool.
It will be appreciated that when the tagged content is copied to the local file, the local file may be tagged, at which point the listener keeps track of some files that may contain AI-generated code by maintaining a list. At this time, if the user copies the AI-generated code to the local file and then to the code editing tool, the AI-generated code will still be marked.
And S6, when part or all of the content of the file marked by the third mark is copied and pasted to the second local file, marking the third mark on the second local file. And when the third marking is carried out, writing the path of the marked file into the marking document.
It will be appreciated that this embodiment employs a contamination mechanism whereby when a file is marked, both its copy and the document containing part of its content are marked. In this way, the user can be prevented from bypassing the mechanism. The design purpose is to standardize the working process of the developer, and although the code generated by the AI cannot be completely prevented from entering the software engineering (for example, the user does not copy but directly transcribe the code), the developer can be reminded or forced to more carefully examine the code, so that the safety risk is reduced. The marks left also facilitate careful evaluation of these codes in code reviews.
In some embodiments, the second marking is performed, in particular:
marking the marked code segment in a highlighting or thickening mode;
wherein all characters are marked independently. Independent marking means that in code editing, a trace is left as long as one character is not deleted.
In some embodiments, the marked code segments are configured to be visible to a user of the preset authority. For example, only the developer of the advanced job position is visible. This allows a review of other developers' behavior of programming with AI.
This is recorded when it is monitored that the content copied or cut by the copy cut-out board originates from a local file marked as sensitive data. It will be appreciated that with this strategy, situations that may lead to data risk can be discovered and prevented in time.
In another aspect, an embodiment of the present application provides a sensitive data security protection system, including:
a memory for storing a program;
and the processor is used for loading the program to execute the sensitive data security protection method.
Note that the above is only a preferred embodiment of the present application and the technical principle applied. Those skilled in the art will appreciate that the present application is not limited to the particular embodiments described herein, but is capable of numerous obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the present application. Therefore, while the present application has been described in connection with the above embodiments, the present application is not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the present application, the scope of which is defined by the scope of the appended claims.

Claims (10)

1. A method for securing sensitive data, comprising:
monitoring a browser and a copying and cutting board, and when the content copied or cut by the copying and cutting board is monitored to be sourced from a preset IP address or domain name of the browser, marking the content in the copying and cutting board for the first time;
when the marked copy cut panel content is pasted to the code editing tool, the content pasted this time is marked for the second time in the code editing tool.
2. The method of claim 1, further comprising the steps of:
disabling pasting of copy-cut-sheet content to the browser when it is monitored that the content copied or cut by the copy-cut-sheet originates from a local file marked as sensitive data;
the sensitive data includes user information and a code.
3. The method of claim 1, further comprising the steps of:
when the content marked by the first mark is pasted to the local file, the third mark is carried out on the local file;
when the content in the file marked by the third mark is pasted to the code editing tool, the second mark is carried out on the pasted content in the code editing tool.
4. The method for protecting sensitive data security according to claim 1, further comprising the steps of:
and when part or all of the content of the file marked by the third mark is copied and pasted to a second local file, marking the second local file by the third mark.
5. The method for protecting sensitive data according to claim 1, wherein the second marking is performed, specifically:
marking the marked code segment in a highlighting or thickening mode;
wherein all characters are marked independently.
6. The sensitive data security method of claim 1, wherein the marked code segments are configured to be visible to a user of the preset authority.
7. The method of claim 1, further comprising the steps of:
this is recorded when it is monitored that the content copied or cut by the copy cut-out board originates from a local file marked as sensitive data.
8. A method of securing sensitive data as claimed in claim 3 wherein the third marking is performed by writing the path of the marked file into the marking document.
9. The method for protecting sensitive data security according to claim 1, wherein: the preconditions for first marking the content in the copy cut-out include:
the composition of the content in the copy cut is detected, and when the proportion of the specific punctuation contained in the copy cut is greater than a threshold value, it is determined that the content in the copy cut is a code.
10. A sensitive data security system, comprising:
a memory for storing a program;
a processor for loading the program to perform the sensitive data security method of any one of claims 1-9.
CN202310431940.5A 2023-04-21 2023-04-21 Sensitive data security protection method and system Active CN116451186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310431940.5A CN116451186B (en) 2023-04-21 2023-04-21 Sensitive data security protection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310431940.5A CN116451186B (en) 2023-04-21 2023-04-21 Sensitive data security protection method and system

Publications (2)

Publication Number Publication Date
CN116451186A true CN116451186A (en) 2023-07-18
CN116451186B CN116451186B (en) 2023-11-17

Family

ID=87129955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310431940.5A Active CN116451186B (en) 2023-04-21 2023-04-21 Sensitive data security protection method and system

Country Status (1)

Country Link
CN (1) CN116451186B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117290890A (en) * 2023-11-24 2023-12-26 浙江口碑网络技术有限公司 Security risk management and control method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080028442A1 (en) * 2006-07-28 2008-01-31 Microsoft Corporation Microsoft Patent Group Copy-paste trust system
US9672366B1 (en) * 2015-03-31 2017-06-06 Symantec Corporation Techniques for clipboard monitoring
CN111858094A (en) * 2020-07-14 2020-10-30 北京海泰方圆科技股份有限公司 Data copying and pasting method and system and electronic equipment
CN113656795A (en) * 2021-08-25 2021-11-16 奇安信科技集团股份有限公司 Window operation behavior audit method and system
CN115935347A (en) * 2021-05-31 2023-04-07 三六零数字安全科技集团有限公司 Clipboard protection method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080028442A1 (en) * 2006-07-28 2008-01-31 Microsoft Corporation Microsoft Patent Group Copy-paste trust system
US9672366B1 (en) * 2015-03-31 2017-06-06 Symantec Corporation Techniques for clipboard monitoring
CN111858094A (en) * 2020-07-14 2020-10-30 北京海泰方圆科技股份有限公司 Data copying and pasting method and system and electronic equipment
CN115935347A (en) * 2021-05-31 2023-04-07 三六零数字安全科技集团有限公司 Clipboard protection method, device, equipment and storage medium
CN113656795A (en) * 2021-08-25 2021-11-16 奇安信科技集团股份有限公司 Window operation behavior audit method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
苏强: "《软件开发知识产权管理工具的设计和实现》", 《信息科技》, no. 11, pages 26 - 39 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117290890A (en) * 2023-11-24 2023-12-26 浙江口碑网络技术有限公司 Security risk management and control method and device, electronic equipment and storage medium
CN117290890B (en) * 2023-11-24 2024-05-10 浙江口碑网络技术有限公司 Security risk management and control method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116451186B (en) 2023-11-17

Similar Documents

Publication Publication Date Title
Augenstein et al. Factuality challenges in the era of large language models and opportunities for fact-checking
US7769787B2 (en) Method and system for maintaining originality-related information about elements in an editable object
US10409892B2 (en) Formatting data by example
CN107480551B (en) File management method and device
US8950005B1 (en) Method and system for protecting content of sensitive web applications
CN103544446B (en) The method and apparatus that document is demarcated level of confidentiality
CN116451186B (en) Sensitive data security protection method and system
US9195808B1 (en) Systems and methods for proactive document scanning
CN102043920A (en) Access quarantine method of public file in data divulgence protection system
Laine et al. Understanding the ethics of generative AI: Established and new ethical principles
CN118606937A (en) APP sensitive feature detection method and system based on large-scale language model
Teppler Testable reliability: a modernized approach to ESI admissibility
Fischbach et al. Cira: A tool for the automatic detection of causal relationships in requirements artifacts
Anson et al. Plagiarism detection and intertextuality software
Falade Investigating the security and privacy issues in ChatGPT usage and their impact on organisational and individual security
Pereira et al. A chatbot assistant for writing good quality technical reports
CN105573686A (en) Identifying and printing control method for sensitive keywords in multiple documents
Householder et al. Lessons Learned in Coordinated Disclosure for Artificial Intelligence and Machine Learning Systems
Simard Position Paper: Should Machine Translation be Labelled as AI-Generated Content?
US20110246965A1 (en) Correcting document generation for policy compliance
Chen The lost data: how AI systems censor LGBTQ+ content in the name of safety
Hattori Wikigramming: a wiki-based training environment for programming
US20220067168A1 (en) Method and apparatus for detecting and remediating security vulnerabilities in computer readable code
Verkijk et al. Sunken Ships Shan't Sail: Ontology Design for Reconstructing Events in the Dutch East India Company Archives
US20240345837A1 (en) Auditable authorship attribution with automatically applied authorship tokens

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant