[go: up one dir, main page]

CN120994203A - Code processing methods, devices, computing equipment, and storage media - Google Patents

Code processing methods, devices, computing equipment, and storage media

Info

Publication number
CN120994203A
CN120994203A CN202511095798.7A CN202511095798A CN120994203A CN 120994203 A CN120994203 A CN 120994203A CN 202511095798 A CN202511095798 A CN 202511095798A CN 120994203 A CN120994203 A CN 120994203A
Authority
CN
China
Prior art keywords
instruction
instructions
code
virtual
interpreter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202511095798.7A
Other languages
Chinese (zh)
Inventor
施熠晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bilibili Technology Co Ltd
Original Assignee
Shanghai Bilibili Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Bilibili Technology Co Ltd filed Critical Shanghai Bilibili Technology Co Ltd
Priority to CN202511095798.7A priority Critical patent/CN120994203A/en
Publication of CN120994203A publication Critical patent/CN120994203A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/447Target code generation

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

本申请公开了代码处理方法、装置、计算设备以及存储介质,方法包括:获取需要保护的JavaScript源代码;生成指令映射表,其中每次生成的指令映射表具有唯一性;按照当前的指令映射表,将源代码编译成包含虚拟指令的目标代码,虚拟指令为不具备实际操作含义的仅用于映射的指令;生成解释器,其中,解释器与当前的指令映射表匹配,用于解析当前编译生成的目标代码,将目标代码中的虚拟指令解析为虚拟机指令,虚拟机指令为JavaScript引擎可识别并执行的指令;输出解释器以及目标代码;如此,本申请实现动态指令混淆,构建难以通过历史代码进行预测的代码结构,提高了代码反向破解的难度,有效阻止静态反编译分析,提升了代码安全性。

This application discloses a code processing method, apparatus, computing device, and storage medium. The method includes: acquiring JavaScript source code to be protected; generating an instruction mapping table, wherein each generated instruction mapping table is unique; compiling the source code into target code containing virtual instructions according to the current instruction mapping table, wherein virtual instructions are instructions that have no actual operational meaning and are only used for mapping; generating an interpreter, wherein the interpreter matches the current instruction mapping table and is used to parse the currently compiled target code, parsing the virtual instructions in the target code into virtual machine instructions, wherein virtual machine instructions are instructions that the JavaScript engine can recognize and execute; and outputting the interpreter and the target code. In this way, this application achieves dynamic instruction obfuscation, constructs a code structure that is difficult to predict through historical code, increases the difficulty of reverse engineering the code, effectively prevents static decompilation analysis, and improves code security.

Description

Code processing method, device, computing equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a code processing method, a code processing device, a computing device, and a storage medium.
Background
Along with the rapid development of internet technology and Web application, javaScript language has become the mainstream scripting language in Web page front-end development, and is widely used for realizing rich page interaction functions and security sensitive functions such as front-end wind control, anti-crawler detection, equipment fingerprint identification and the like. However, since the JavaScript script is transmitted in a plaintext form and executed in the client browser, an attacker can easily analyze, decompil and extract the JavaScript code by means of a developer tool (e.g. Chrome DevTools) or an Abstract Syntax Tree (AST) parsing tool (e.g. Babel) built in the browser, so that sensitive logic, algorithm and business rules in the code are exposed, and huge potential safety hazards are brought.
The above information disclosed in this background section is only included to enhance understanding of the background of the disclosure and thus may contain information that does not form a related art that is already presently known to those of ordinary skill in the art.
Disclosure of Invention
The application provides a code processing method, a device, a computing device and a storage medium, which are used for solving the problem that the protection strength of a JavaScript protection scheme in the related art is insufficient.
The application adopts the following technical scheme.
In a first aspect, the present application provides a code processing method, including:
acquiring JavaScript source code to be protected;
generating an instruction mapping table, wherein the instruction mapping table is obtained by establishing a randomized mapping relation between the operation of a source code and a virtual instruction, and each generated instruction mapping table has uniqueness;
compiling a source code into an object code containing a virtual instruction according to a current instruction mapping table, wherein the virtual instruction is an instruction which is only used for mapping and does not have actual operation meaning;
Generating an interpreter, wherein the interpreter is matched with the current instruction mapping table and is used for analyzing the target code generated by current compiling, analyzing a virtual instruction in the target code into a virtual machine instruction, and the virtual machine instruction is an instruction which can be identified and executed by a JavaScript engine;
the interpreter and the object code are output.
The application realizes the confusion protection of JavaScript codes, and regenerates a unique instruction mapping table each time of compiling, namely the instruction mapping table is dynamic, and the generated instruction mapping table is difficult to predict and restore each time through a randomized mapping relation, so that the safety of the codes is enhanced, and similarly, an interpreter matched with the current instruction mapping table is regenerated each time of compiling, thus realizing the confusion of dynamic instructions, constructing a code structure which is difficult to predict through historical codes, improving the difficulty of reverse cracking of the codes, effectively preventing static decompilation analysis and improving the safety of the codes. The application belongs to pure JavaScript confusion, the product is suitable for running in a pure JS environment, the compatibility of a low-version browser can be realized in the product, and the problems of incompatibility of the low-version browser and obvious calling characteristics existing in WebAssembly protection detection schemes are solved.
With reference to the first aspect, in one possible implementation manner, establishing a randomized mapping relationship between the operation of the source code and the virtual instruction includes performing random ordering on a plurality of nodes of the abstract syntax tree prepared in advance, and determining a sequence number corresponding to each node after random ordering as a virtual instruction, thereby establishing a mapping relationship between the nodes and the virtual instruction.
Therefore, the mapping relation is established by randomly sequencing the abstract syntax tree nodes, so that an attacker is effectively prevented from restoring the mapping relation by statically analyzing the abstract syntax tree nodes, and the reverse analysis difficulty is further improved.
With reference to the first aspect, in one possible implementation manner, the instruction mapping table comprises a mapping relation between nodes of an abstract syntax tree and virtual instructions, and the method further comprises the steps of analyzing a source code into the abstract syntax tree, preprocessing the abstract syntax tree, and obtaining a node sequence formed by arranging a plurality of nodes of the abstract syntax tree in sequence;
Compiling the source code into an object code containing virtual instructions according to the current instruction mapping table, which comprises the steps of inquiring the instruction mapping table and converting the nodes in the node sequence into the corresponding virtual instructions so as to obtain the object code containing the virtual instructions.
With reference to the first aspect, in one possible implementation manner, the instruction mapping table comprises mapping relations between nodes of the abstract syntax tree and virtual instructions, and the generation interpreter comprises inquiring a preset instruction set to obtain virtual machine instructions corresponding to the virtual instructions according to each virtual instruction in the instruction mapping table, and establishing analysis logic between each virtual instruction and the corresponding virtual machine instructions, wherein the preset instruction set comprises a plurality of nodes of the abstract syntax tree prepared in advance and a plurality of virtual machine instructions corresponding to the nodes of the abstract syntax tree.
Therefore, based on the instruction mapping table, the analysis logic between the virtual instruction and the virtual machine instruction is established in the interpreter, so that after the interpreter is loaded at the front end, the interpreter can quickly convert the virtual instruction of the target code into the executable virtual machine instruction.
With reference to the first aspect, in one possible implementation manner, the method further includes performing syntax degradation processing on the source code when the source code is acquired.
Therefore, the source code is subjected to grammar degradation, the instruction set is simplified through grammar level compression, the compiling speed is improved, the volume of an interpreter is reduced, and the confusion effect is enhanced.
With reference to the first aspect, in one possible implementation manner, outputting the interpreter and the target code includes outputting the interpreter and the target code as a JavaScript file after confusion processing.
Therefore, on the basis of realizing dynamic instruction confusion, the interpreter and the target code are further output into a single JavaScript file after being mixed, the deduction of the mapping relation of the instruction set is destroyed, and the possibility of forging the custom interpreter by a hacker is fundamentally avoided.
With reference to the first aspect, in one possible implementation manner, the method includes creating an environment monitoring logic, where the environment monitoring logic is configured to determine whether an attribute of a host environment of the interpreter belongs to an automation framework feature attribute, and if yes, throw an error.
Therefore, the environment monitoring logic can effectively prevent the interpreter from executing in an abnormal browser environment (such as a debugger or an automation tool), prevent the code from being maliciously debugged or analyzed, and further improve the safety of the code.
In a second aspect, the present application also provides a code processing method, including:
the method comprises the steps of obtaining an interpreter and an object code, wherein the interpreter is matched with an instruction mapping table, the instruction mapping table and the interpreter are generated once in each compiling process of JavaScript source code, the generated instruction mapping table has uniqueness, and the instruction mapping table is obtained by establishing a randomized mapping relation between the operation of the source code and a virtual instruction;
and executing an analysis result which is obtained by analyzing the target code by the interpreter and contains virtual machine instructions so as to realize the function corresponding to the source code, wherein the virtual machine instructions are instructions which can be identified and executed by the JavaScript engine.
Therefore, the mapping table and the interpreter are dynamically changed during each compiling, so that the confusion of dynamic instructions is realized, a code structure which is difficult to predict through a history code is constructed, the difficulty of reverse code cracking is improved, static decompilation analysis is effectively prevented, and the code safety is improved. And because the interpreter is synchronously generated during compiling and is matched with the instruction mapping table, the interpreter analyzes the process of restoring the virtual machine instruction by the virtual instruction, ensures that the protected code function is consistent with the source code function, and improves the code safety.
In a third aspect, the application further provides a code processing device. The code processing apparatus comprises means for performing the code processing method of the first aspect or any of the alternative implementations of the first aspect. For example, the code processing apparatus includes:
The code acquisition module is used for acquiring JavaScript source codes to be protected;
The mapping table generation module is used for generating an instruction mapping table before compiling the source code into the target code each time, wherein the instruction mapping table is obtained by establishing a randomized mapping relation between the operation of the source code and the virtual instruction, and the generated instruction mapping table each time has uniqueness;
The source code compiling module is used for compiling the source code into an object code containing a virtual instruction according to the current instruction mapping table, wherein the virtual instruction is an instruction which does not have actual operation meaning and is only used for mapping;
The interpreter generation module is used for generating an interpreter, wherein the interpreter is matched with the current instruction mapping table and used for analyzing the target code generated by current compiling, analyzing a virtual instruction in the target code into a virtual machine instruction, and the virtual machine instruction is an instruction which can be identified and executed by the JavaScript engine;
and the output module is used for outputting the interpreter and the target code.
For more detailed implementation details of the code processing apparatus reference is made to the description of any implementation of the first aspect above.
In a fourth aspect, the application further provides a code processing device. The code processing apparatus comprises means for performing the second aspect or the code processing method in any of the alternative implementations of the second aspect. For example, the code processing apparatus includes:
The acquisition module is used for acquiring an interpreter and an object code, wherein the interpreter is matched with the instruction mapping table; the method comprises the steps that an instruction mapping table and an interpreter are generated once in each compiling process of a JavaScript source code, and each generated instruction mapping table has uniqueness, wherein the instruction mapping table is obtained by establishing a randomized mapping relation between the operation of the source code and a virtual instruction;
And the execution module is used for executing an analysis result which is obtained by analyzing the target code by the interpreter and contains virtual machine instructions so as to realize the function corresponding to the source code, wherein the virtual machine instructions are instructions which can be identified and executed by the JavaScript engine.
For more details of implementation of the code processing apparatus reference is made to the description of any implementation of the above second aspect.
In a fifth aspect, the present application provides a computing device. The computing device comprises a memory for storing a computer program or instructions which, when executed by the processor, implement the method of the first aspect or any one of the possible implementations of the first aspect or implement the method of the second aspect or any one of the possible implementations of the second aspect.
In a sixth aspect, the present application provides a computer-readable storage medium. The storage medium has stored therein a computer program or instructions which, when executed by a processor, implement the method of the first aspect or any of the possible implementations of the first aspect or implement the method of the second aspect or any of the possible implementations of the second aspect.
In a seventh aspect, the present application provides a computer program product. The computer program product comprises a computer program or instructions which, when executed by a processor, implement the method of the first aspect or any of the possible implementations of the first aspect or implement the method of the second aspect or any of the possible implementations of the second aspect.
The advantages of the above second to seventh aspects may refer to the first aspect or any possible implementation manner of the first aspect, and are not described herein. Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects.
Additional advantages, objects, and features of the application will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained from the structures shown in the drawings without inventive effort to those skilled in the art.
FIG. 1 is one of the flowcharts of a code processing method shown in an exemplary embodiment of the application;
FIG. 2 is one of the flowcharts of the generation interpreter;
FIG. 3 is a second flowchart of the generation of an interpreter;
FIG. 4 is a second flowchart of a code processing method according to an exemplary embodiment of the present application;
FIG. 5 is a third flowchart of a code processing method according to an exemplary embodiment of the present application;
FIG. 6 is one of the flowcharts of an interpreter parsing object code;
FIG. 7 is a second flow chart of an interpreter parsing object code;
FIG. 8 is one of the functional block diagrams of a code processing apparatus shown in an exemplary embodiment of the present application;
FIG. 9 is a second schematic block diagram of a code processing apparatus according to an exemplary embodiment of the present application;
FIG. 10 is a functional block diagram of a computing device shown in accordance with an exemplary embodiment of the present application;
FIG. 11 is a schematic diagram of an implementation environment of a code processing method according to an embodiment of the present application;
fig. 12 is a JSVmp block diagram.
Detailed Description
In the present application, the words "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items. In the present application, "at least one" means one or more, and "a plurality" means two or more. The terms including ordinal numbers such as "first", "second", and the like used in the present application may be used to describe various constituent elements, but these constituent elements are not limited by these terms. These terms are used only for distinguishing one component from another and are not to be construed as indicating or implying relative importance. For example, a first component may be termed a second component, and, similarly, a second component may be termed a first component, without departing from the scope of the present application.
Before describing embodiments of the present application, technical terms and background related to the present application will be described first.
And controlling the complete browser engine to execute page rendering and script analysis through the command line interface in the headless browser running mode.
Control flow flattening-code obfuscation techniques, which convert nested code logic into a linear execution structure, control basic block execution order through a scheduler.
The pseudo-random algorithm is a deterministic value generation method, and a reproducible sequence similar to a random number in statistics is generated through seed value initialization.
Abstract syntax tree (Abstract Syntax Tree, AST for short) the source code structure representation, describes precisely the hierarchical relationship between the syntax elements of the program in a tree-like data structure.
Along with the rapid development of internet technology and Web application, javaScript language has become the mainstream scripting language in Web page front-end development, and is widely used for realizing rich page interaction functions and security sensitive functions such as front-end wind control, anti-crawler detection, equipment fingerprint identification and the like. However, since the JavaScript script is transmitted in a plaintext form and executed in the client browser, an attacker can easily analyze, decompil and extract the JavaScript code by means of a developer tool (e.g. Chrome DevTools) or an Abstract Syntax Tree (AST) parsing tool (e.g. Babel) built in the browser, so that sensitive logic, algorithm and business rules in the code are exposed, and huge potential safety hazards are brought.
In order to prevent sensitive JavaScript codes from being easily analyzed and restored, currently, code obfuscation techniques are widely used in the industry to increase the difficulty of an attacker in analyzing codes by changing variable names, inserting redundant codes or modifying code structures. However, the static confusion mode is basically a simple transformation of code text, and cannot radically change the execution mode and structural characteristics of the code, so that the static confusion mode is still easy to be confused by an attacker through AST tools, manual analysis and other modes, and the original logic is restored.
Recently, some research has proposed virtualization protection schemes for front-end JavaScript code that attempt to hide its original execution logic by compiling the JavaScript code into specific virtual instructions. The technical idea is to translate JavaScript code into WebAssembly (WASM) byte codes and execute the codes by using a WebAssembly engine built in a browser so as to enhance the code protection effect. Although the scheme has certain advantages in performance and safety, the scheme has the obvious defects that WebAssembly is not supported by all browsers, particularly the low-version or old-version browsers are incompatible, so that the scheme is poor in compatibility and difficult to widely apply, the calling characteristics of WebAssembly are obvious, an attacker can easily identify and analyze the module calling mode and instruction coding rule, the conventional WebAssembly virtualization scheme generally adopts a fixed instruction coding mode, the capability of dynamic update and change is lacking, and the protection intensity is gradually reduced along with the accumulation of analysis experience of the attacker.
The front-end JavaScript protection scheme still has obvious defects, and is particularly characterized in that JavaScript codes are stored in a plaintext or simple static confusion form and are easy to extract and analyze by a browser debugging tool, static confusion codes are easy to reversely analyze and restore by an AST tool, the existing WebAssembly scheme has limited compatibility and is difficult to be compatible with all browsers, a virtualization protection scheme of a fixed instruction set is easy to be identified and cracked by an attacker, and the safety is difficult to be ensured for a long time.
In summary, the related art has a problem that the JavaScript protection scheme is insufficient in protection strength. In order to solve the above problems, an embodiment of the present application provides a code processing method, which can greatly improve code security.
As shown in fig. 12, the present application belongs to a JavaScript code virtualization protection scheme based on JSVmp (JavaScript Virtual Machine Protection) ideas, and the idea of the present application is that a compiler dynamically generates an instruction mapping table and an interpreter matched with the instruction mapping table every time the compiler compiles a source code into virtual instructions according to the instruction mapping table, the compiled target code is executed in a simulation environment of a front-end virtual machine, specifically, the target code is interpreted by the interpreter generated during compiling, the interpreter is matched with the instruction mapping table, and the interpreter parses the virtual instructions into instructions recognizable and executable by the front-end virtual machine (essentially, javaScript engine).
The following first describes one or more exemplary operating environments to provide a more convenient and clear understanding of the role and intent of various implementations in embodiments of the present application. Fig. 11 is a schematic diagram of an implementation environment of a method provided by an embodiment of the present application, where the method provided by the embodiment of the present application may be applied in the environment of fig. 11, and the environment includes a development end, a server, and a client. The development end and the client are respectively in communication connection with the server. The client comprises any type of terminal equipment such as iOS, android and the like, for example, smart phones, tablet computers, iMac, PAD and the like. A server is to be construed broadly to mean a subject that is capable of responding to external requests and providing data, resources, or services. The server may be a server or a server cluster. The development end can be provided with one or more computer devices, and a code development tool is installed on the development end, so that the development function of JavaScript source code can be realized. The source code can be compiled by the development end and sent to the server, and the compiler at the server side is responsible for compiling and realizing the source code, and the compiler installed by the development end can also be responsible for compiling. In practical application, a developer is generally responsible for writing and compiling JavaScript source codes on a computer device at a developing end, a JavaScript file finally compiled and output is uploaded to a server, a client obtains a JavaScript file (JS file for short) from the server, and a JavaScript engine at the client loads and executes the JS file. The JavaScript source code to be protected in the application is compiled into the JS file at the development end, and the JS file is obtained by compiling and arranged in various front-end webpages, applications and other environments of the client, and any host environment capable of executing the JS file is included in the executable or practical range, so long as the capability of executing the JS file can be provided.
The technical scheme of the application is described below through a plurality of embodiments. It should be understood that these embodiments may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.
Referring to fig. 1, in a first aspect, the present application provides a code processing method, which may be executed by a server (where a development end is only responsible for source code writing) or may be executed by a development end, and the present application is not limited thereto, and in the following embodiments, a method is described by way of example by the development end. As shown in fig. 1, the code processing method provided by the embodiment of the present application includes steps S101 to S109:
s101, acquiring JavaScript source codes to be protected;
S103, generating an instruction mapping table, wherein the instruction mapping table generated each time has uniqueness;
S105, compiling a source code into a target code containing a virtual instruction according to a current instruction mapping table, wherein the virtual instruction is an instruction which does not have actual operation meaning and is only used for mapping;
S107, generating an interpreter, wherein the interpreter is matched with the current instruction mapping table and is used for analyzing the target code generated by current compiling, analyzing a virtual instruction in the target code into a virtual machine instruction, and the virtual machine instruction is an instruction which can be identified and executed by a JavaScript engine;
s109, outputting an interpreter and target codes.
Therefore, the application realizes the confusion protection of the JavaScript code in the compiling stage, and regenerates the unique instruction mapping table each time of compiling, namely the instruction mapping table is dynamic, and similarly, an interpreter matched with the current instruction mapping table is regenerated each time of compiling and matched with the instruction mapping table, thus realizing the confusion of dynamic instructions, constructing a code structure which is difficult to predict through a historical code, improving the difficulty of code reverse cracking, effectively preventing static decompiling analysis and improving the code safety. In addition, the product is suitable for running in a pure JS environment through pure JavaScript confusion, and can realize compatibility of a low-version browser, so that the problems of incompatibility of the low-version browser and obvious calling characteristics in WebAssembly protection detection schemes are solved.
Each of steps S101-S109, and optionally other steps, are described in detail below in conjunction with fig. 2-7.
S101, acquiring JavaScript source code to be protected;
The development terminal is provided with a compiler (or compiling tool), and the method of the embodiment can be specifically realized by the compiler. After the development tool is used by a developer to write, the JavaScript source code to be protected is formed, and is imported into a compiler, so that the JavaScript source code to be protected can be obtained by the compiler.
Referring to fig. 2, the method further includes a step S201, after the step S101, of parsing the source code into an abstract syntax tree, and preprocessing the abstract syntax tree to obtain a node sequence formed by arranging a plurality of nodes of the abstract syntax tree in order.
For example, the source code is parsed using Babel to generate an abstract syntax tree (also known as an AST), the AST is traversed for preprocessing, and finally the AST is converted into a node sequence. The node sequence is essentially formed by arranging a plurality of nodes of the abstract syntax tree (hereinafter abbreviated as AST nodes) in order. An AST node representation is a low-level representation.
Pre-processing an AST is a mature technology, including collecting the character string literal quantity into an array, flattening conversion of control flow, adding code dead zone, adding random dead code block, converting object literal quantity and renaming object attribute, etc. As in fig. 12, the preprocessing may be implemented in particular by a preprocessor of a compiler.
For example, for the source code con.log ('hello world'), the node sequence P:P=[debugger,ifJump, typeof,debugger,params,'=',RegExpLiteral,debugger,typeof,get,RegExpLiteral,tryNumber], obtained after step S101, that is, a total of 12 AST nodes, is shown in table 1 as follows:
Table 1 list of node sequences P
Regarding S103, generating an instruction mapping table;
in the embodiment of the application, the instruction mapping table is generated once after each compiling, and the generated instruction mapping table has uniqueness and can not be reused. That is, the instruction mapping table generated each time can only be used for the current compilation and cannot be used for other compilations.
The method comprises the steps of generating an instruction mapping table, wherein the generation of the instruction mapping table comprises the step of establishing a randomized mapping relation between the operation of source codes and virtual instructions.
Virtual instructions are instructions that are used only for mapping without a meaning of actual operations. For example, the virtual instruction may be a number 1,2,3, or the like, may be a letter a, b, c, ax, dge, or the like, may be any symbol $, #,% or the like that does not have an operation meaning, or may be a mixture of any of the above.
The establishment of the randomized mapping relationship can be realized by means of a pseudo-random algorithm. In one possible implementation, establishing the randomized mapping relationship between the source code operation and the virtual instruction includes performing random ordering (e.g., using a pseudo-random algorithm) on a plurality of nodes of the abstract syntax tree prepared in advance, and determining a sequence number corresponding to each node after random ordering as a virtual instruction, thereby establishing the mapping relationship between the nodes and the virtual instruction. I.e. the instruction mapping table comprises the mapping relation of the nodes of the abstract syntax tree, i.e. AST nodes, and the virtual instructions.
For example, a plurality of AST nodes prepared in advance may be placed in an array a, each element in the array a representing one AST node:
A =['tryNumber', 'RegExpLiteral', 'try','ifJump', 'params', 'typeof','=', '++', 'def_v','--', 'localScope', 'property',......];
The Fisher-Yates shuffling algorithm (as shown by the shuffleArray function below) may then be applied to shuffle the order of the elements in array A:
function shuffleArray(array: Array<string>): Array<string>{
for (let i = array.length - 1; i>0; i--) {
const j = Math.floor(Math.random() * (i + 1))
;[array[i], array[j]]= [array[j], array[i]]
}
return array
}
inputting the array A into the function parameters, and obtaining the array A1 after shuffling, for example:
A1=['new','globalScope','switchJump','ifJump','>>>','typeof','>>', '++', 'def_v','--','<<','property',......];
the sequence number corresponding to each element in A1 is a virtual instruction, and the elements and sequence numbers in A1 are sorted into the following table 2, namely an instruction mapping table MIT-1 is obtained:
TABLE 2 instruction mapping table MIT-1
And when compiling next time, the Fisher-Yates shuffling algorithm is also applied to disturb the sequence of elements in the array A, so that the method can be used for obtaining:
A2=['tryNumber','RegExpLiteral','try','ifJump','params','typeof','=','++','def_v','--','localScope','property','debugger',......];
the sequence number corresponding to the element in A2 is a virtual instruction, and the element and the sequence number in A2 are arranged into the following table 3 to obtain an instruction mapping table MIT-2:
TABLE 3 instruction map MIT-2
It can be seen that the instruction mapping table generated each time is different, that is, the virtual instruction corresponding to the AST node is not fixed, and the ordering sequence number (virtual instruction) corresponding to the element (AST node) in the array a is shuffled in each compiling process. Therefore, the mapping relation is established by adopting the pseudo-random ordering algorithm to the AST nodes, so that an attacker is effectively prevented from restoring the mapping relation by statically analyzing the abstract syntax tree nodes, and the reverse analysis difficulty is further improved.
Regarding S105, compiling the source code into an object code containing virtual instructions according to the current instruction mapping table;
Specifically, compiling the source code into an object code containing virtual instructions according to a current instruction mapping table comprises the steps of inquiring the instruction mapping table, and converting nodes in a node sequence into corresponding virtual instructions so as to obtain the object code containing the virtual instructions.
It should be noted that, the source code may be completely or partially transcoded into the virtual instruction, that is, the target code may be completely or partially virtual instruction, and other or conventional processing, which are all within the scope of the present application. The whole or part of transcoding into the virtual instruction can be realized by configuring AST nodes in the virtual instruction, only AST nodes supported by the instruction mapping table can be transcoded into the virtual instruction, and AST nodes not supported by the instruction mapping table can be processed in a traditional manner. For example, the general source code includes operators (e.g., add-subtract multiplier-divide) and operators (e.g., for-loop, delete operation, etc.), and in some embodiments, all operators and operators may be covered by the AST node supported by the instruction map, and in one embodiment, only operators need to be covered by the AST node supported by the instruction map, i.e., sufficient to achieve the desired aliasing effect.
For example, suppose that in the current compilation, the node sequence obtained in step S101 is P, i.e., table 1. The instruction map generated in step S103 is MIT-2, table 3. Referring to tables 1 and 3, a series of virtual instructions (indicated by instrucitions) may be obtained as shown in Table 4 below:
Table 4 example table of AST node conversion virtual instruction obtained in step S101
It can be seen that, the node sequence in step S101 (table 1) belongs to all AST nodes supported by the instruction mapping table MIT-2 (table 3), and a series of virtual instructions in the second column in table 4 can be obtained by transcoding the node sequence in step S101 against the instruction mapping table MIT-2, that is, the target code in step S105 is all composed of virtual instructions, and the target code corresponding to the finally obtained source code log ('hello world') is a sequence composed of 12 virtual instructions, which is represented as an array as follows:
[12, 3, 5, 12, 4, 6, 1, 12, 5, 30, 1, 0];
it can be seen that the virtual instruction in the object code is a series of digits (other characters are possible in other embodiments) without any operation information, and even if the object code is obtained, the source code cannot be decoded.
It will be appreciated that the above transcoding process of the present application also supports the source code of MD5, SHA256, etc. type signature algorithms. The transcoding process of the present application achieves a dynamic obfuscation effect, as shown in fig. 12, which is implemented by the obfuscator of the compiler. The compiler needs to maintain the same functions as other compilers, such as index pools, operators, etc., except for the maintenance of the instruction mapping table mentioned above, and this part is the same as the conventional compiler, and will not be repeated.
Regarding S107, generating an interpreter;
the interpreter is matched with the current instruction mapping table and is used for analyzing the target code generated by current compiling and analyzing the virtual instruction in the target code into a virtual machine instruction.
The virtual machine instruction is an instruction which can be identified and executed by the JavaScript engine and comprises a stack operation instruction, a heap operation instruction, a register operation instruction, a control flow instruction, an operation instruction and an environment interaction instruction.
Referring to fig. 2, in one possible implementation, an interpreter is generated comprising the steps S201-S203:
S201, inquiring a preset instruction set to obtain virtual machine instructions corresponding to virtual instructions according to each virtual instruction in an instruction mapping table;
The preset instruction set comprises a plurality of nodes (completely consistent with the nodes of the instruction mapping table) prepared in advance and a plurality of virtual machine instructions corresponding to the nodes.
For example, in the preset instruction set, the virtual machine instruction corresponding to the AST node tryNumber is:
t0 = stack.pop();
t0 = Number(t0);
if (isNaN(t0)) t0 = -1;
stack.push(t0);
For the virtual instruction in the instruction mapping table, the corresponding AST node is tryNumber, and the virtual machine instruction above can be found according to the positioning of the preset instruction set to the AST node tryNumber.
S203, establishing a piece of analysis logic between each virtual instruction and the corresponding virtual machine instruction.
The analysis logic may be if statement, case statement, etc., the condition of if and case is the specific value of the virtual instruction, and the execution logic when the condition of if and case is satisfied is the virtual machine instruction. The example of S201 above may build a piece of parsing logic as follows:
if (instruction == 0) {
t0 = stack.pop();
t0 = Number(t0);
if (isNaN(t0)) t0 = -1;
stack.push(t0);
}
Similarly, for each other virtual instruction in the instruction mapping table, according to the AST node corresponding to the virtual instruction in the instruction mapping table, the virtual machine instruction corresponding to the AST node in the preset instruction set can be queried, and the analysis logic can be established.
Therefore, based on the instruction mapping table and the preset instruction set, the analysis logic between the virtual instruction and the virtual machine instruction can be established in the interpreter, so that the interpreter can quickly convert the virtual instruction of the target code into the executable virtual machine instruction after the front-end loading of the interpreter is facilitated.
In one possible implementation, referring to fig. 3, generating the interpreter further includes:
s301, establishing environment monitoring logic, wherein the environment monitoring logic is used for judging whether the attribute of the host environment of the interpreter belongs to the characteristic attribute of the automatic framework, and if so, throwing an error.
The environment monitoring logic is in the same JavaScript document as the parsing logic, and the environment monitoring logic precedes the parsing logic. The environment monitoring logic is used for checking whether the window and the document object corresponding to the host environment have the attribute set by the automation frame (including headless browser, automation debugging tool, robot or simulation environment, etc.), and specifically can classify whether the host environment belongs to different automation frames by scanning specific character strings or regular expressions. For example, an attribute mapping table may be constructed, where the attribute mapping table includes various possible automation frame types and attributes corresponding to each type, the environment monitoring logic traverses each automation frame type in the attribute mapping table, and according to the currently traversed automation frame type, checks whether any attribute (such as awesomium, cefSharp, __ nightmare or a matching regular expression) corresponding to the currently traversed automation frame type appears in the window and the document object corresponding to the host environment, and if so, determines and marks the host environment as the automation frame type. For example, if the CefSharp attribute is found in the window object, it identifies a possible CefSharp environment.
In this manner, the environment monitoring logic detects different network automation frameworks by matching known attribute names that are unique to each technology, and may return an object in which each automation framework type corresponds to a boolean value describing whether any unique attributes are identified. Once the running environment of the code is monitored to come from a certain automation framework, an exception is thrown, so that the execution of an interpreter in an abnormal browser environment (such as a debugger or an automation tool) can be effectively avoided, the code is prevented from being maliciously debugged or analyzed, and the safety of the code is further improved.
The interpreter needs to maintain the same functions as other conventional interpreters, such as its own scope, constant pool, global variables, etc., except for the above-mentioned parsing logic, environment monitoring logic, which is similar to the conventional interpreter generation in fig. 12, and thus is not expanded.
With respect to S109, the interpreter and the object code are output.
In one possible implementation, outputting the interpreter and the target code includes outputting the interpreter and the target code as a JavaScript file after confusion processing.
And combining the interpreter code to be output and the target code, and then performing confusion processing to finally generate a JS file. Specifically, the object code may be attached to the interpreter code first, and then the merged whole code may be subjected to a obfuscation process. The obfuscation approach may employ conventional code protection means such as code compression, variable name and function name obfuscation, dead code injection, etc. These measures ensure that the code can still be correctly recognized and executed by the virtual machine or the operating environment after confusion, while preventing the direct retrieval and analysis of the interpreter code and object code by the black-out. Finally, the code after confusion processing is exported as a JS file, and the JS file can be uploaded to a server, and the JS file is issued to a browser of a client side by the server.
Therefore, on the basis of realizing dynamic instruction confusion, static protection is realized by further carrying out confusion processing on the interpreter and the target code, and on the basis of the dual defense system of dynamic confusion and static protection, the geometric grade improvement of the security strength is realized, the deduction of the mapping relation of the instruction set is destroyed, and the possibility of forging the custom interpreter by a hacker is fundamentally avoided.
Referring to FIG. 4, in some embodiments, the method further includes, after step S101, step S401, subjecting the source code to syntax downgrading.
As one example, a high version JavaScript (e.g., ES 6+) grammar may be converted to a low version (e.g., ES 5) grammar, specifically using the Babel tool for transcoding, designating the target environment as ES5. For example, converting const/let to var, arrow function to normal function, template string to string splice, etc.
Because the degraded code reduces grammar variation, the subsequent AST node types are more unified, so that the generation logic of the instruction mapping table is simplified, and the instruction set scale of the instruction mapping table is reduced. For example, the class of ES6 would be downgraded to prototype syntax, avoiding additional processing of class related instructions by the virtual machine. The instruction is simplified, and the compiling speed is improved. Code degradation can further enhance the obfuscation effect because the degraded code has lost high-version grammatical features (e.g., simplicity of arrow functions), and further increases the difficulty of manual analysis in combination with subsequent obfuscation steps, blocking AST restoration attacks, and an attacker cannot infer source code logic by restoring high-version grammatical features (e.g., reversing prototype code to class). In short, the embodiment degrades the grammar of the source code, realizes the simplification of the instruction set through the grammar level compression, achieves the promotion of the compiling speed, reduces the volume of an interpreter and enhances the confusion effect.
In summary, the following beneficial effects exist in the embodiment of the present application:
1) The dynamic confusion defense system based on the instruction mapping table is characterized in that the instruction mapping table is generated through a pseudo-random algorithm to realize dynamic instruction confusion, a code structure which is difficult to predict through a history code is constructed, the reverse engineering difficulty is remarkably improved, and static decompilation analysis is effectively prevented. In addition, the confusion process belongs to pure JavaScript confusion, the virtual instruction is not a machine code, no operation significance exists, the virtual instruction is only needed to be interpreted as a virtual machine instruction by an interpreter, the compiled product is a JS file suitable for running in a pure JS environment, the compatibility of a low-version browser can be realized in the product, and the problems of incompatibility of the low-version browser and obvious calling characteristics existing in a WebAssembly protection detection scheme are solved.
2) The instruction set double protection architecture is that codes after the confusion of dynamic instructions are mixed with an interpreter secondarily, the deduction of the mapping relation of the instruction set is destroyed, and the possibility of forging a custom interpreter by a hacker is fundamentally avoided.
3) And the environment-aware anti-debugging mechanism is characterized in that environment monitoring logic is implanted into an interpreter, the debugging process is actively interfered in a mode of triggering and throwing out an abnormality by randomizing an abnormal environment, the time cost of dynamic analysis is greatly increased, and the mirror image cloning attack path of the virtual machine is blocked.
4) Compiling an optimized confusion paradigm, namely designing JS grammar dimensionality reduction conversion, and combining with an instruction dynamic confusion technology, simplifying an instruction set through grammar level compression, so as to achieve the effects of improving compiling speed and reducing the volume of an interpreter.
In the embodiment of the application, a four-in-one defense matrix of dynamic confusion, instruction protection, environment perception, compiling and optimizing is constructed, and the balance breakthrough of the safety protection intensity and the execution efficiency is realized on the premise of keeping the code equivalence.
It is noted that the present specification provides method operational steps as an example or a flowchart, but may include more or fewer operational steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. In practice, the method programs may be executed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment) in accordance with the methods shown in the embodiments or figures.
Based on the same technical concept, referring to fig. 5, in a second aspect, the present application provides a code processing method, where the method is executed by a client, and may be implemented by loading and executing JavaScript files by JavaScript engines (e.g. V8, webkit) of various front-end web pages, applications, etc. of the client. Referring to fig. 5, the method includes steps S501-S503:
s501, acquiring an interpreter and target codes;
And S503, executing an analysis result which is obtained by analyzing the target code by the interpreter and contains virtual machine instructions so as to realize the function corresponding to the source code.
The interpreter matches the instruction mapping table. The instruction mapping table and the interpreter are generated once in each compiling process of the JavaScript source code, and each generated instruction mapping table has uniqueness. The target code comprises a virtual instruction, and the source code is compiled according to an instruction mapping table, and the virtual instruction is only used for mapping without actual operation meaning.
The instruction mapping table carries a randomized mapping relation between the operation of the source code and the virtual instruction. The mapping relation is randomized, a plurality of nodes of the abstract syntax tree are randomly ordered, and the sequence number of each node after the random ordering is determined to be a virtual instruction, so that the mapping relation between the node and the virtual instruction is established. In some embodiments, the instruction mapping table includes a mapping relationship between nodes of the abstract syntax tree and virtual instructions, and the object code is obtained by querying the instruction mapping table to convert the nodes in the node sequence into the corresponding virtual instructions. The node sequence is formed by arranging a plurality of nodes of the abstract syntax tree of the source code in sequence after pretreatment.
It will be appreciated that the interpreter and the object code are output by the method embodiments of the first aspect, so that further details may refer to the method steps of the embodiments of the development terminal, and are not described herein.
Referring to FIG. 6, in some embodiments, the interpreter parsing the object code referred to in S503 specifically includes steps S601-S603:
S601, reading target codes by an interpreter according to the sequence;
specifically, a JavaScript file issued by a server is obtained, and the content in the JavaScript file is obtained after confusion processing is performed by an interpreter and an object code. Confusion of interpreters and target codes belongs to code confusion, and the confusion does not affect the recognition of a JavaScript engine.
And S603, when the virtual instruction is read, positioning to the analysis logic corresponding to the virtual instruction, so as to obtain the virtual machine instruction in the analysis logic as the analysis result of the currently read virtual instruction.
For example, in the example of step S105, the resulting object code is a sequence of 12 virtual instructions [12, 3, 5, 12, 4, 6, 1, 12, 5, 30, 1, 0], which the interpreter would read sequentially.
1) The interpreter first reads the virtual instruction "12", then the interpretation will go back to the following parsing logic:
if (instruction == 12) {
debugger;
}
Resulting in a virtual machine instruction of "debugger".
2) The interpreter continues to read back the virtual instruction "3", and the interpretation will go back to the following parsing logic:
if (instruction == 3) {
t0 = stack.pop();
t1 = bytecode[index++];
if (!t0) index = t1;
}
The resulting virtual machine instruction is:
t0 = stack.pop();
t1 = bytecode[index++];
if (!t0) index = t1;
And so on, finally, a series of virtual machine instructions can be obtained, and the effect of source code control.
When the embodiment of the application is applied to a browser environment, referring to fig. 7, in some embodiments, the method further includes:
and S701, executing an environment monitoring logic after the interpreter is started, wherein the environment monitoring logic judges whether the attribute of the host environment of the interpreter belongs to the characteristic attribute of the automatic framework, and if so, throwing an error.
After the browser obtains the JS file issued by the server, a JavaScript engine in a webpage of the browser loads the JS file to start an interpreter in the JS file, and environment monitoring logic in the interpreter is started, and the environment monitoring logic checks whether the attributes on a window and a document object corresponding to a host environment belong to an automation framework or not in a mode of scanning a specific character string or a regular expression. Specifically, an attribute mapping table is maintained in the environment monitoring logic, where the attribute mapping table includes various possible automation frame types and attributes corresponding to each type (one type may correspond to a plurality of attributes), the environment monitoring logic traverses each automation frame type in the attribute mapping table, and checks whether any attribute (such as awesomium, cefSharp, nightmare or a matched regular expression) corresponding to the current automation frame type appears in a window and a document object corresponding to the host environment according to the currently traversed automation frame type, and if so, determines and marks the host environment as the automation frame type. For example, if the CefSharp attribute is found in the window object, it identifies a possible CefSharp environment. Such processing may block dynamic debugging and automation attacks.
Based on the same technical concept, in a third aspect, referring to fig. 8, an embodiment of the present application further provides a code processing apparatus, which may be specifically a compiler. The code processing apparatus includes respective modules for executing the code processing method of the above first aspect. For example, the code processing apparatus includes:
A code obtaining module 801, configured to obtain JavaScript source code to be protected;
A mapping table generating module 802, configured to generate an instruction mapping table before compiling the source code into the target code each time, where the instruction mapping table generated each time has uniqueness;
The source code compiling module 803 is configured to compile a source code into an object code including a virtual instruction according to a current instruction mapping table, where the virtual instruction is an instruction only used for mapping without an actual operation meaning;
The interpreter generating module 804 is configured to generate an interpreter, where the interpreter is matched with the current instruction mapping table, and is configured to parse a target code generated by current compiling, parse a virtual instruction in the target code into a virtual machine instruction, and the virtual machine instruction is an instruction that can be identified and executed by the JavaScript engine;
An output module 805 for outputting the interpreter and the object code.
The apparatus of the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the apparatus of each embodiment of the present application correspond to steps in the method of the embodiment of the first aspect of the present application, and detailed functional descriptions of each module of the apparatus may be referred to in the foregoing description of the method embodiment of the first aspect, which is not repeated herein.
It should be noted that in the description of the various modules herein, the modules are divided for clarity of illustration. However, in actual implementation, the boundaries of the various modules may be fuzzy. For example, any or all of the functional modules in the present application may share various hardware and/or software elements. As another example, any and/or all of the functional blocks of the present application may be implemented in whole or in part by execution of software instructions by a common processor. In addition, various software sub-modules executed by one or more processors may be shared among various software modules. Accordingly, the scope of the present application is not limited by the mandatory boundaries between the various hardware and/or software elements unless expressly required.
Based on the same technical concept, in a fourth aspect, referring to fig. 9, the embodiment of the application further provides a code processing apparatus, which may specifically be a JavaScript engine installed at the front end. The code processing apparatus comprises means for performing the second aspect or the code processing method in any of the alternative implementations of the second aspect. For example, the code processing apparatus includes:
The acquisition module 901 is used for acquiring an interpreter and an object code, wherein the interpreter is matched with the instruction mapping table, the instruction mapping table and the interpreter are generated once in each compiling process of the JavaScript source code, and the generated instruction mapping table has uniqueness;
The execution module 902 is configured to execute an analysis result obtained by analyzing the object code by the interpreter, where the analysis result includes a virtual machine instruction, so as to implement a function corresponding to the source code, where the virtual machine instruction is an instruction that can be identified and executed by the JavaScript engine.
The apparatus according to the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the apparatus according to each embodiment of the present application correspond to steps in the method according to the embodiment of the second aspect of the present application, and detailed functional descriptions of each module in the apparatus may be referred to in the foregoing description of the method embodiment of the second aspect, which is not repeated herein.
Based on the same technical concept, in a fifth aspect, referring to fig. 10, an embodiment of the present application further provides a computing device 1000, including a memory 1001 and a processor 1002, a communication module 1003, an input/output interface 1004, and other components, where connection communication between the components may be optionally implemented by using a bus 1005. The memory 1001 is for storing computer programs or instructions which, when executed by the processor 1002, implement the method steps in any of the method embodiments of the first or second aspect. It should be noted that the structure of the apparatus 1000 shown in fig. 10 is only schematic, and does not limit the apparatus to which the method provided in the embodiment of the present application is applied.
The specific entity of the computing device, which may be a server or a computer at the originating end for implementing the method steps in any of the method embodiments of the first aspect, may also be a computer at the client end for implementing the method steps in any of the method embodiments of the second aspect.
The memory 1001 may be used for storing an operating system and a computer program or instructions or the like which, when invoked by the processor 1101, implement the method of the first or second aspect of the present invention, and the memory 1001 may also store a program for implementing other functions or services. Memory 1001 includes at least one type of computer-readable storage medium including flash memory, hard disk, multimedia card, random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), magnetic disk, optical disk, and the like. In some embodiments, the computer-readable storage medium may be an internal storage unit of a computing device, such as a hard disk or memory of the computing device. In other embodiments, the computer readable storage medium may also be an external storage device of a computing device, such as a plug-in hard disk, secure Digital (SD) card, flash memory card, etc. provided on the computing device. Of course, the computer-readable storage medium may also include both internal storage units of the computing device and external storage devices. In this embodiment, the computer readable storage medium is typically used to store software installed on the computing device, such as program code of the embodiment method of the first aspect or the second aspect, and so on. Furthermore, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.
The processor 1002 is connected to the memory 1102 through a bus 1005, and executes corresponding functions by calling application programs stored in the memory 1102. And in some embodiments may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other chip. The processor 1002 is generally configured to control overall operation of the processing device, such as performing control and processing related to data interaction or communication with other subjects, and the like. In this embodiment, a processor 1002 is used to execute program code or process data stored in a memory 1001.
The computing device 1000 may connect to a network through a communication module 1003 (which may include, but is not limited to, components such as a network interface) to enable interaction of data, such as sending data to or receiving data from other devices (e.g., user terminals or servers, etc.) through the network. The communication module 1003 may include a wired network interface and/or a wireless network interface, etc., that is, the communication module may include at least one of a wired communication module or a wireless communication module.
The computing device 1000 may be connected to required input/output devices such as a keyboard, a display device, etc. through the input/output interface 1004, the device 110 itself may have a display device, and may also be externally connected to other display devices through the interface 1004. It is understood that the input/output interface 1004 may be a wired interface or a wireless interface. Depending on the actual application scenario, the device connected to the input/output interface 1004 may be a component of the device 1000, or may be an external device connected to the device 1000 when needed.
The bus 1005 used to connect the components may include a path to transfer information between the components. Bus 1005 may be a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 1005 may be classified into an address bus, a data bus, a control bus, and the like according to functions.
Based on the same technical idea, the embodiments of the present application further provide a computer readable storage medium, in which a computer program or instructions are stored, which, when executed by a processing device, implement the method steps in any of the method embodiments of the first aspect or the second aspect. For more details, reference may be made to method embodiments, which are not described here again. In this embodiment, the computer-readable storage medium may be nonvolatile or volatile. Computer-readable storage media include flash memory, hard disk, multimedia card, random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), magnetic disk, optical disk, and the like. In some embodiments, the computer-readable storage medium may be an internal storage unit of a computing device, such as a hard disk or memory of the computing device. In other embodiments, the computer readable storage medium may also be an external storage device of a computing device, such as a plug-in hard disk, secure Digital (SD) card, flash memory card, etc. provided on the computing device. Of course, the computer-readable storage medium may also include both internal storage units of the computing device and external storage devices. In this embodiment, the computer readable storage medium is typically used to store software installed on the computing device, such as program code of the embodiment method of the first aspect or the second aspect, and so on. Furthermore, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.
Based on the same technical idea, embodiments of the present application also provide a computer program product or a computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method provided by the method embodiments of the first aspect or the second aspect.
It should be noted that, the description order of the embodiments of the present application is not limited to the priority order of the embodiments. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
While the embodiments of the present application have been described above with reference to the drawings, the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many modifications may be made thereto by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the appended claims, which are to be encompassed by the present application in its spirit, equivalent to the present application described in the specification and drawings, or by direct/indirect application to other related technical fields.

Claims (13)

1.一种代码处理方法,其特征在于,所述方法包括:1. A code processing method, characterized in that the method comprises: 获取需要保护的JavaScript源代码;Obtain the JavaScript source code that needs protection; 生成指令映射表,所述指令映射表是通过建立源代码的操作与虚拟指令之间的随机化映射关系得到,其中每次生成的指令映射表具有唯一性;An instruction mapping table is generated, which is obtained by establishing a randomized mapping relationship between source code operations and virtual instructions, wherein each generated instruction mapping table is unique; 按照当前的所述指令映射表,将所述源代码编译成包含虚拟指令的目标代码,所述虚拟指令为不具备实际操作含义的仅用于映射的指令;According to the current instruction mapping table, the source code is compiled into target code containing virtual instructions, which are instructions that have no actual operational meaning and are only used for mapping. 生成解释器,其中,所述解释器与当前的所述指令映射表匹配,用于解析当前编译生成的所述目标代码,将所述目标代码中的虚拟指令解析为虚拟机指令,所述虚拟机指令为JavaScript引擎可识别并执行的指令;An interpreter is generated, wherein the interpreter matches the current instruction mapping table and is used to parse the currently compiled target code, and to parse the virtual instructions in the target code into virtual machine instructions, wherein the virtual machine instructions are instructions that can be recognized and executed by the JavaScript engine; 输出所述解释器以及所述目标代码。Output the interpreter and the target code. 2.根据权利要求1所述的方法,其特征在于,所述建立源代码的操作与虚拟指令之间的随机化映射关系,包括:2. The method according to claim 1, characterized in that the randomized mapping relationship between the operation of establishing source code and virtual instructions includes: 将预先准备的抽象语法树的多个节点,进行随机排序,将每个节点对应的随机排序后的序号确定为一个虚拟指令,从而建立节点与虚拟指令的映射关系。The nodes of the pre-prepared abstract syntax tree are randomly sorted, and the randomized sequence number corresponding to each node is determined as a virtual instruction, thereby establishing a mapping relationship between nodes and virtual instructions. 3.根据权利要求1-2任一项所述的方法,其特征在于,所述指令映射表包括抽象语法树的节点与虚拟指令的映射关系;3. The method according to any one of claims 1-2, wherein the instruction mapping table includes the mapping relationship between nodes of the abstract syntax tree and virtual instructions; 所述方法还包括:将所述源代码解析为抽象语法树,对所述抽象语法树进行预处理,得到由抽象语法树的多个节点按顺序排列形成的节点序列;The method further includes: parsing the source code into an abstract syntax tree, preprocessing the abstract syntax tree, and obtaining a node sequence formed by arranging multiple nodes of the abstract syntax tree in order; 所述按照当前的所述指令映射表,将所述源代码编译成包含虚拟指令的目标代码,包括:查询所述指令映射表,将所述节点序列中的节点转换对应的虚拟指令,从而得到包含虚拟指令的所述目标代码。The step of compiling the source code into target code containing virtual instructions according to the current instruction mapping table includes: querying the instruction mapping table, converting the nodes in the node sequence into corresponding virtual instructions, thereby obtaining the target code containing virtual instructions. 4.根据权利要求1-3任一项所述的方法,其特征在于,所述指令映射表包括抽象语法树的节点与虚拟指令的映射关系;4. The method according to any one of claims 1-3, wherein the instruction mapping table includes the mapping relationship between nodes of the abstract syntax tree and virtual instructions; 所述生成解释器,包括:针对指令映射表中的每一个虚拟指令,查询预设指令集得到虚拟指令所对应的虚拟机指令,建立每一条虚拟指令与对应的虚拟机指令之间的一条解析逻辑;The generator interpreter includes: for each virtual instruction in the instruction mapping table, querying a preset instruction set to obtain the virtual machine instruction corresponding to the virtual instruction, and establishing a parsing logic between each virtual instruction and its corresponding virtual machine instruction; 其中,所述预设指令集,包括预先准备的抽象语法树的多个节点及其所对应的多个虚拟机指令。The preset instruction set includes multiple nodes of a pre-prepared abstract syntax tree and their corresponding multiple virtual machine instructions. 5.根据权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:在获取到所述源代码时,将所述源代码进行语法降级处理。5. The method according to any one of claims 1-4, wherein the method further comprises: performing syntax downgrading processing on the source code when the source code is obtained. 6.根据权利要求1-5任一项所述的方法,其特征在于,所述生成解释器,包括:6. The method according to any one of claims 1-5, wherein the generator interpreter comprises: 建立环境监测逻辑,所述环境监测逻辑用于判断所述解释器的宿主环境的属性是否属于自动化框架特征属性,若是,则抛出错误。An environmental monitoring logic is established to determine whether the attributes of the host environment of the interpreter belong to the characteristic attributes of the automation framework. If so, an error is thrown. 7.一种代码处理方法,其特征在于,包括:7. A code processing method, characterized in that it includes: 获取解释器以及目标代码,其中:所述解释器与指令映射表匹配;所述指令映射表、所述解释器均是在JavaScript源代码每一次编译过程中生成一次,且每次生成的指令映射表具有唯一性,所述指令映射表是通过建立源代码的操作与虚拟指令之间的随机化映射关系得到;所述目标代码包含虚拟指令,且是按照所述指令映射表将所述源代码编译得到,所述虚拟指令为不具备实际操作含义的仅用于映射的指令;The process involves obtaining an interpreter and target code, wherein: the interpreter matches an instruction map; both the instruction map and the interpreter are generated once during each compilation of the JavaScript source code, and each generated instruction map is unique; the instruction map is obtained by establishing a randomized mapping relationship between source code operations and virtual instructions; the target code contains virtual instructions and is obtained by compiling the source code according to the instruction map, wherein the virtual instructions are instructions that have no actual operational meaning and are only used for mapping; 执行由所述解释器解析所述目标代码得到的包含虚拟机指令的解析结果,以实现与源代码对应的功能,所述虚拟机指令为JavaScript引擎可识别并执行的指令。The parsing result, which contains virtual machine instructions, obtained by the interpreter from parsing the target code, is executed to achieve the function corresponding to the source code. The virtual machine instructions are instructions that the JavaScript engine can recognize and execute. 8.根据权利要求7所述的方法,其特征在于,所述解释器通过如下方式解析所述目标代码,包括:8. The method according to claim 7, wherein the interpreter parses the target code in the following manner: 按照顺序读取目标代码;Read the target code in sequence; 当读取到虚拟指令时,定位到与该虚拟指令对应的解析逻辑,从而得到所述解析逻辑中的虚拟机指令作为当前读取到的虚拟指令的解析结果。When a virtual instruction is read, the parsing logic corresponding to that virtual instruction is located, and the virtual machine instruction in the parsing logic is obtained as the parsing result of the currently read virtual instruction. 9.根据权利要求7-8任一项所述的方法,其特征在于,所述方法还包括:9. The method according to any one of claims 7-8, characterized in that the method further comprises: 所述解释器启动后执行环境监测逻辑,所述环境监测逻辑判断所述解释器的宿主环境的属性是否属于自动化框架特征属性,若是,则抛出错误。After the interpreter starts, it executes environment monitoring logic. The environment monitoring logic determines whether the attributes of the interpreter's host environment belong to the characteristics of the automation framework. If so, it throws an error. 10.一种代码处理装置,其特征在于,包括:10. A code processing apparatus, characterized in that it comprises: 代码获取模块,用于获取需要保护的JavaScript源代码;The code retrieval module is used to retrieve the JavaScript source code that needs to be protected. 映射表生成模块,用于在每次将所述源代码编译为目标代码之前,生成指令映射表,所述指令映射表是通过建立源代码的操作与虚拟指令之间的随机化映射关系得到,其中每次生成的指令映射表具有唯一性;The mapping table generation module is used to generate an instruction mapping table before compiling the source code into target code each time. The instruction mapping table is obtained by establishing a randomized mapping relationship between the operation of the source code and virtual instructions, wherein the instruction mapping table generated each time is unique. 源代码编译模块,用于按照当前的所述指令映射表,将所述源代码编译成包含虚拟指令的目标代码,所述虚拟指令为不具备实际操作含义的仅用于映射的指令;The source code compilation module is used to compile the source code into target code containing virtual instructions according to the current instruction mapping table. The virtual instructions are instructions that have no actual operational meaning and are only used for mapping. 解释器生成模块,用于生成解释器,其中,所述解释器与当前的所述指令映射表匹配,用于解析当前编译生成的所述目标代码,将所述目标代码中的虚拟指令解析为虚拟机指令,所述虚拟机指令为JavaScript引擎可识别并执行的指令;An interpreter generation module is used to generate an interpreter, wherein the interpreter matches the current instruction mapping table and is used to parse the currently compiled target code, and parse the virtual instructions in the target code into virtual machine instructions, wherein the virtual machine instructions are instructions that can be recognized and executed by the JavaScript engine; 输出模块,用于输出所述解释器以及所述目标代码。The output module is used to output the interpreter and the target code. 11.一种代码处理装置,其特征在于,包括:11. A code processing apparatus, characterized in that it comprises: 获取模块,用于获取解释器以及目标代码,其中:所述解释器与指令映射表匹配;所述指令映射表、所述解释器均是在JavaScript源代码每一次编译过程中生成一次,且每次生成的指令映射表具有唯一性,所述指令映射表是通过建立源代码的操作与虚拟指令之间的随机化映射关系得到;所述目标代码包含虚拟指令,且是按照所述指令映射表将所述源代码编译得到,所述虚拟指令为不具备实际操作含义的仅用于映射的指令;An acquisition module is used to acquire an interpreter and target code, wherein: the interpreter matches an instruction map; both the instruction map and the interpreter are generated once during each compilation of the JavaScript source code, and each generated instruction map is unique; the instruction map is obtained by establishing a randomized mapping relationship between source code operations and virtual instructions; the target code contains virtual instructions and is obtained by compiling the source code according to the instruction map, wherein the virtual instructions are instructions that have no actual operational meaning and are only used for mapping; 执行模块,用于执行由所述解释器解析所述目标代码得到的包含虚拟机指令的解析结果,以实现与源代码对应的功能,所述虚拟机指令为JavaScript引擎可识别并执行的指令。An execution module is used to execute the parsing result containing virtual machine instructions obtained by the interpreter from the target code, so as to implement the function corresponding to the source code. The virtual machine instructions are instructions that can be recognized and executed by the JavaScript engine. 12.一种计算设备,其特征在于,包括存储器和处理器,所述存储器用于存储计算机程序或指令;当所述计算机程序或指令被处理器执行时,实现权利要求1至6中任一项或者7-9中任一项所述的方法。12. A computing device, characterized in that it comprises a memory and a processor, the memory being used to store computer programs or instructions; when the computer programs or instructions are executed by the processor, they implement the method of any one of claims 1 to 6 or any one of claims 7 to 9. 13.一种计算机可读存储介质,其特征在于,所述存储介质中存储有计算机程序或指令,当所述计算机程序或指令被处理器执行时,实现权利要求1至6中任一项或者7-9中任一项所述的方法。13. A computer-readable storage medium, characterized in that the storage medium stores a computer program or instructions that, when executed by a processor, implement the method of any one of claims 1 to 6 or any one of claims 7 to 9.
CN202511095798.7A 2025-08-05 2025-08-05 Code processing methods, devices, computing equipment, and storage media Pending CN120994203A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511095798.7A CN120994203A (en) 2025-08-05 2025-08-05 Code processing methods, devices, computing equipment, and storage media

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511095798.7A CN120994203A (en) 2025-08-05 2025-08-05 Code processing methods, devices, computing equipment, and storage media

Publications (1)

Publication Number Publication Date
CN120994203A true CN120994203A (en) 2025-11-21

Family

ID=97699514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202511095798.7A Pending CN120994203A (en) 2025-08-05 2025-08-05 Code processing methods, devices, computing equipment, and storage media

Country Status (1)

Country Link
CN (1) CN120994203A (en)

Similar Documents

Publication Publication Date Title
David et al. Neural reverse engineering of stripped binaries using augmented control flow graphs
CN108614960B (en) A JavaScript virtualization protection method based on front-end bytecode technology
Mytkowicz et al. Data-parallel finite-state machines
EP2082318B1 (en) Register-based instruction optimization for facilitating efficient emulation of an instruction stream
US7725883B1 (en) Program interpreter
US10613844B2 (en) Using comments of a program to provide optimizations
Kalysch et al. VMAttack: Deobfuscating virtualization-based packed binaries
CN114428639B (en) A method and system for simplifying bytecode instruction set
Srinivasan et al. Synthesis of machine code from semantics
CN107632832B (en) A dalvik-oriented bytecode control flow obfuscation method
CN114385173B (en) Compilation method, device, equipment and storage medium
CN112379917A (en) Browser compatibility improving method, device, equipment and storage medium
Ďurfina et al. Design of a retargetable decompiler for a static platform-independent malware analysis
CN111008067B (en) A method and apparatus for executing functional modules in a virtual machine
Ţălu A comparative study of WebAssembly runtimes: performance metrics, integration challenges, application domains, and security features
Mohan Comparative analysis of JavaScript and WebAssembly in the browser environment
CN113721928B (en) Binary analysis-based dynamic library clipping method
CN108536696A (en) A kind of database personalized self-service query platform and method
CN115756480A (en) An Android application reinforcement method, system and device
Deo et al. Performance and metrics analysis between python3 via mojo
US12050687B1 (en) Systems and methods for malware detection in portable executable files
US11307962B2 (en) Method for semantic preserving transform mutation discovery and vetting
CN117591087B (en) An efficient formal code construction method for complex data processing requirements
CN120994203A (en) Code processing methods, devices, computing equipment, and storage media
CN116432176A (en) Web malicious program detection method and system based on cross-language semantic analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination