Disclosure of Invention
The application provides a code processing method, a device, a computing device and a storage medium, which are used for solving the problem that the protection strength of a JavaScript protection scheme in the related art is insufficient.
The application adopts the following technical scheme.
In a first aspect, the present application provides a code processing method, including:
acquiring JavaScript source code to be protected;
generating an instruction mapping table, wherein the instruction mapping table is obtained by establishing a randomized mapping relation between the operation of a source code and a virtual instruction, and each generated instruction mapping table has uniqueness;
compiling a source code into an object code containing a virtual instruction according to a current instruction mapping table, wherein the virtual instruction is an instruction which is only used for mapping and does not have actual operation meaning;
Generating an interpreter, wherein the interpreter is matched with the current instruction mapping table and is used for analyzing the target code generated by current compiling, analyzing a virtual instruction in the target code into a virtual machine instruction, and the virtual machine instruction is an instruction which can be identified and executed by a JavaScript engine;
the interpreter and the object code are output.
The application realizes the confusion protection of JavaScript codes, and regenerates a unique instruction mapping table each time of compiling, namely the instruction mapping table is dynamic, and the generated instruction mapping table is difficult to predict and restore each time through a randomized mapping relation, so that the safety of the codes is enhanced, and similarly, an interpreter matched with the current instruction mapping table is regenerated each time of compiling, thus realizing the confusion of dynamic instructions, constructing a code structure which is difficult to predict through historical codes, improving the difficulty of reverse cracking of the codes, effectively preventing static decompilation analysis and improving the safety of the codes. The application belongs to pure JavaScript confusion, the product is suitable for running in a pure JS environment, the compatibility of a low-version browser can be realized in the product, and the problems of incompatibility of the low-version browser and obvious calling characteristics existing in WebAssembly protection detection schemes are solved.
With reference to the first aspect, in one possible implementation manner, establishing a randomized mapping relationship between the operation of the source code and the virtual instruction includes performing random ordering on a plurality of nodes of the abstract syntax tree prepared in advance, and determining a sequence number corresponding to each node after random ordering as a virtual instruction, thereby establishing a mapping relationship between the nodes and the virtual instruction.
Therefore, the mapping relation is established by randomly sequencing the abstract syntax tree nodes, so that an attacker is effectively prevented from restoring the mapping relation by statically analyzing the abstract syntax tree nodes, and the reverse analysis difficulty is further improved.
With reference to the first aspect, in one possible implementation manner, the instruction mapping table comprises a mapping relation between nodes of an abstract syntax tree and virtual instructions, and the method further comprises the steps of analyzing a source code into the abstract syntax tree, preprocessing the abstract syntax tree, and obtaining a node sequence formed by arranging a plurality of nodes of the abstract syntax tree in sequence;
Compiling the source code into an object code containing virtual instructions according to the current instruction mapping table, which comprises the steps of inquiring the instruction mapping table and converting the nodes in the node sequence into the corresponding virtual instructions so as to obtain the object code containing the virtual instructions.
With reference to the first aspect, in one possible implementation manner, the instruction mapping table comprises mapping relations between nodes of the abstract syntax tree and virtual instructions, and the generation interpreter comprises inquiring a preset instruction set to obtain virtual machine instructions corresponding to the virtual instructions according to each virtual instruction in the instruction mapping table, and establishing analysis logic between each virtual instruction and the corresponding virtual machine instructions, wherein the preset instruction set comprises a plurality of nodes of the abstract syntax tree prepared in advance and a plurality of virtual machine instructions corresponding to the nodes of the abstract syntax tree.
Therefore, based on the instruction mapping table, the analysis logic between the virtual instruction and the virtual machine instruction is established in the interpreter, so that after the interpreter is loaded at the front end, the interpreter can quickly convert the virtual instruction of the target code into the executable virtual machine instruction.
With reference to the first aspect, in one possible implementation manner, the method further includes performing syntax degradation processing on the source code when the source code is acquired.
Therefore, the source code is subjected to grammar degradation, the instruction set is simplified through grammar level compression, the compiling speed is improved, the volume of an interpreter is reduced, and the confusion effect is enhanced.
With reference to the first aspect, in one possible implementation manner, outputting the interpreter and the target code includes outputting the interpreter and the target code as a JavaScript file after confusion processing.
Therefore, on the basis of realizing dynamic instruction confusion, the interpreter and the target code are further output into a single JavaScript file after being mixed, the deduction of the mapping relation of the instruction set is destroyed, and the possibility of forging the custom interpreter by a hacker is fundamentally avoided.
With reference to the first aspect, in one possible implementation manner, the method includes creating an environment monitoring logic, where the environment monitoring logic is configured to determine whether an attribute of a host environment of the interpreter belongs to an automation framework feature attribute, and if yes, throw an error.
Therefore, the environment monitoring logic can effectively prevent the interpreter from executing in an abnormal browser environment (such as a debugger or an automation tool), prevent the code from being maliciously debugged or analyzed, and further improve the safety of the code.
In a second aspect, the present application also provides a code processing method, including:
the method comprises the steps of obtaining an interpreter and an object code, wherein the interpreter is matched with an instruction mapping table, the instruction mapping table and the interpreter are generated once in each compiling process of JavaScript source code, the generated instruction mapping table has uniqueness, and the instruction mapping table is obtained by establishing a randomized mapping relation between the operation of the source code and a virtual instruction;
and executing an analysis result which is obtained by analyzing the target code by the interpreter and contains virtual machine instructions so as to realize the function corresponding to the source code, wherein the virtual machine instructions are instructions which can be identified and executed by the JavaScript engine.
Therefore, the mapping table and the interpreter are dynamically changed during each compiling, so that the confusion of dynamic instructions is realized, a code structure which is difficult to predict through a history code is constructed, the difficulty of reverse code cracking is improved, static decompilation analysis is effectively prevented, and the code safety is improved. And because the interpreter is synchronously generated during compiling and is matched with the instruction mapping table, the interpreter analyzes the process of restoring the virtual machine instruction by the virtual instruction, ensures that the protected code function is consistent with the source code function, and improves the code safety.
In a third aspect, the application further provides a code processing device. The code processing apparatus comprises means for performing the code processing method of the first aspect or any of the alternative implementations of the first aspect. For example, the code processing apparatus includes:
The code acquisition module is used for acquiring JavaScript source codes to be protected;
The mapping table generation module is used for generating an instruction mapping table before compiling the source code into the target code each time, wherein the instruction mapping table is obtained by establishing a randomized mapping relation between the operation of the source code and the virtual instruction, and the generated instruction mapping table each time has uniqueness;
The source code compiling module is used for compiling the source code into an object code containing a virtual instruction according to the current instruction mapping table, wherein the virtual instruction is an instruction which does not have actual operation meaning and is only used for mapping;
The interpreter generation module is used for generating an interpreter, wherein the interpreter is matched with the current instruction mapping table and used for analyzing the target code generated by current compiling, analyzing a virtual instruction in the target code into a virtual machine instruction, and the virtual machine instruction is an instruction which can be identified and executed by the JavaScript engine;
and the output module is used for outputting the interpreter and the target code.
For more detailed implementation details of the code processing apparatus reference is made to the description of any implementation of the first aspect above.
In a fourth aspect, the application further provides a code processing device. The code processing apparatus comprises means for performing the second aspect or the code processing method in any of the alternative implementations of the second aspect. For example, the code processing apparatus includes:
The acquisition module is used for acquiring an interpreter and an object code, wherein the interpreter is matched with the instruction mapping table; the method comprises the steps that an instruction mapping table and an interpreter are generated once in each compiling process of a JavaScript source code, and each generated instruction mapping table has uniqueness, wherein the instruction mapping table is obtained by establishing a randomized mapping relation between the operation of the source code and a virtual instruction;
And the execution module is used for executing an analysis result which is obtained by analyzing the target code by the interpreter and contains virtual machine instructions so as to realize the function corresponding to the source code, wherein the virtual machine instructions are instructions which can be identified and executed by the JavaScript engine.
For more details of implementation of the code processing apparatus reference is made to the description of any implementation of the above second aspect.
In a fifth aspect, the present application provides a computing device. The computing device comprises a memory for storing a computer program or instructions which, when executed by the processor, implement the method of the first aspect or any one of the possible implementations of the first aspect or implement the method of the second aspect or any one of the possible implementations of the second aspect.
In a sixth aspect, the present application provides a computer-readable storage medium. The storage medium has stored therein a computer program or instructions which, when executed by a processor, implement the method of the first aspect or any of the possible implementations of the first aspect or implement the method of the second aspect or any of the possible implementations of the second aspect.
In a seventh aspect, the present application provides a computer program product. The computer program product comprises a computer program or instructions which, when executed by a processor, implement the method of the first aspect or any of the possible implementations of the first aspect or implement the method of the second aspect or any of the possible implementations of the second aspect.
The advantages of the above second to seventh aspects may refer to the first aspect or any possible implementation manner of the first aspect, and are not described herein. Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects.
Additional advantages, objects, and features of the application will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the application.
Detailed Description
In the present application, the words "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items. In the present application, "at least one" means one or more, and "a plurality" means two or more. The terms including ordinal numbers such as "first", "second", and the like used in the present application may be used to describe various constituent elements, but these constituent elements are not limited by these terms. These terms are used only for distinguishing one component from another and are not to be construed as indicating or implying relative importance. For example, a first component may be termed a second component, and, similarly, a second component may be termed a first component, without departing from the scope of the present application.
Before describing embodiments of the present application, technical terms and background related to the present application will be described first.
And controlling the complete browser engine to execute page rendering and script analysis through the command line interface in the headless browser running mode.
Control flow flattening-code obfuscation techniques, which convert nested code logic into a linear execution structure, control basic block execution order through a scheduler.
The pseudo-random algorithm is a deterministic value generation method, and a reproducible sequence similar to a random number in statistics is generated through seed value initialization.
Abstract syntax tree (Abstract Syntax Tree, AST for short) the source code structure representation, describes precisely the hierarchical relationship between the syntax elements of the program in a tree-like data structure.
Along with the rapid development of internet technology and Web application, javaScript language has become the mainstream scripting language in Web page front-end development, and is widely used for realizing rich page interaction functions and security sensitive functions such as front-end wind control, anti-crawler detection, equipment fingerprint identification and the like. However, since the JavaScript script is transmitted in a plaintext form and executed in the client browser, an attacker can easily analyze, decompil and extract the JavaScript code by means of a developer tool (e.g. Chrome DevTools) or an Abstract Syntax Tree (AST) parsing tool (e.g. Babel) built in the browser, so that sensitive logic, algorithm and business rules in the code are exposed, and huge potential safety hazards are brought.
In order to prevent sensitive JavaScript codes from being easily analyzed and restored, currently, code obfuscation techniques are widely used in the industry to increase the difficulty of an attacker in analyzing codes by changing variable names, inserting redundant codes or modifying code structures. However, the static confusion mode is basically a simple transformation of code text, and cannot radically change the execution mode and structural characteristics of the code, so that the static confusion mode is still easy to be confused by an attacker through AST tools, manual analysis and other modes, and the original logic is restored.
Recently, some research has proposed virtualization protection schemes for front-end JavaScript code that attempt to hide its original execution logic by compiling the JavaScript code into specific virtual instructions. The technical idea is to translate JavaScript code into WebAssembly (WASM) byte codes and execute the codes by using a WebAssembly engine built in a browser so as to enhance the code protection effect. Although the scheme has certain advantages in performance and safety, the scheme has the obvious defects that WebAssembly is not supported by all browsers, particularly the low-version or old-version browsers are incompatible, so that the scheme is poor in compatibility and difficult to widely apply, the calling characteristics of WebAssembly are obvious, an attacker can easily identify and analyze the module calling mode and instruction coding rule, the conventional WebAssembly virtualization scheme generally adopts a fixed instruction coding mode, the capability of dynamic update and change is lacking, and the protection intensity is gradually reduced along with the accumulation of analysis experience of the attacker.
The front-end JavaScript protection scheme still has obvious defects, and is particularly characterized in that JavaScript codes are stored in a plaintext or simple static confusion form and are easy to extract and analyze by a browser debugging tool, static confusion codes are easy to reversely analyze and restore by an AST tool, the existing WebAssembly scheme has limited compatibility and is difficult to be compatible with all browsers, a virtualization protection scheme of a fixed instruction set is easy to be identified and cracked by an attacker, and the safety is difficult to be ensured for a long time.
In summary, the related art has a problem that the JavaScript protection scheme is insufficient in protection strength. In order to solve the above problems, an embodiment of the present application provides a code processing method, which can greatly improve code security.
As shown in fig. 12, the present application belongs to a JavaScript code virtualization protection scheme based on JSVmp (JavaScript Virtual Machine Protection) ideas, and the idea of the present application is that a compiler dynamically generates an instruction mapping table and an interpreter matched with the instruction mapping table every time the compiler compiles a source code into virtual instructions according to the instruction mapping table, the compiled target code is executed in a simulation environment of a front-end virtual machine, specifically, the target code is interpreted by the interpreter generated during compiling, the interpreter is matched with the instruction mapping table, and the interpreter parses the virtual instructions into instructions recognizable and executable by the front-end virtual machine (essentially, javaScript engine).
The following first describes one or more exemplary operating environments to provide a more convenient and clear understanding of the role and intent of various implementations in embodiments of the present application. Fig. 11 is a schematic diagram of an implementation environment of a method provided by an embodiment of the present application, where the method provided by the embodiment of the present application may be applied in the environment of fig. 11, and the environment includes a development end, a server, and a client. The development end and the client are respectively in communication connection with the server. The client comprises any type of terminal equipment such as iOS, android and the like, for example, smart phones, tablet computers, iMac, PAD and the like. A server is to be construed broadly to mean a subject that is capable of responding to external requests and providing data, resources, or services. The server may be a server or a server cluster. The development end can be provided with one or more computer devices, and a code development tool is installed on the development end, so that the development function of JavaScript source code can be realized. The source code can be compiled by the development end and sent to the server, and the compiler at the server side is responsible for compiling and realizing the source code, and the compiler installed by the development end can also be responsible for compiling. In practical application, a developer is generally responsible for writing and compiling JavaScript source codes on a computer device at a developing end, a JavaScript file finally compiled and output is uploaded to a server, a client obtains a JavaScript file (JS file for short) from the server, and a JavaScript engine at the client loads and executes the JS file. The JavaScript source code to be protected in the application is compiled into the JS file at the development end, and the JS file is obtained by compiling and arranged in various front-end webpages, applications and other environments of the client, and any host environment capable of executing the JS file is included in the executable or practical range, so long as the capability of executing the JS file can be provided.
The technical scheme of the application is described below through a plurality of embodiments. It should be understood that these embodiments may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.
Referring to fig. 1, in a first aspect, the present application provides a code processing method, which may be executed by a server (where a development end is only responsible for source code writing) or may be executed by a development end, and the present application is not limited thereto, and in the following embodiments, a method is described by way of example by the development end. As shown in fig. 1, the code processing method provided by the embodiment of the present application includes steps S101 to S109:
s101, acquiring JavaScript source codes to be protected;
S103, generating an instruction mapping table, wherein the instruction mapping table generated each time has uniqueness;
S105, compiling a source code into a target code containing a virtual instruction according to a current instruction mapping table, wherein the virtual instruction is an instruction which does not have actual operation meaning and is only used for mapping;
S107, generating an interpreter, wherein the interpreter is matched with the current instruction mapping table and is used for analyzing the target code generated by current compiling, analyzing a virtual instruction in the target code into a virtual machine instruction, and the virtual machine instruction is an instruction which can be identified and executed by a JavaScript engine;
s109, outputting an interpreter and target codes.
Therefore, the application realizes the confusion protection of the JavaScript code in the compiling stage, and regenerates the unique instruction mapping table each time of compiling, namely the instruction mapping table is dynamic, and similarly, an interpreter matched with the current instruction mapping table is regenerated each time of compiling and matched with the instruction mapping table, thus realizing the confusion of dynamic instructions, constructing a code structure which is difficult to predict through a historical code, improving the difficulty of code reverse cracking, effectively preventing static decompiling analysis and improving the code safety. In addition, the product is suitable for running in a pure JS environment through pure JavaScript confusion, and can realize compatibility of a low-version browser, so that the problems of incompatibility of the low-version browser and obvious calling characteristics in WebAssembly protection detection schemes are solved.
Each of steps S101-S109, and optionally other steps, are described in detail below in conjunction with fig. 2-7.
S101, acquiring JavaScript source code to be protected;
The development terminal is provided with a compiler (or compiling tool), and the method of the embodiment can be specifically realized by the compiler. After the development tool is used by a developer to write, the JavaScript source code to be protected is formed, and is imported into a compiler, so that the JavaScript source code to be protected can be obtained by the compiler.
Referring to fig. 2, the method further includes a step S201, after the step S101, of parsing the source code into an abstract syntax tree, and preprocessing the abstract syntax tree to obtain a node sequence formed by arranging a plurality of nodes of the abstract syntax tree in order.
For example, the source code is parsed using Babel to generate an abstract syntax tree (also known as an AST), the AST is traversed for preprocessing, and finally the AST is converted into a node sequence. The node sequence is essentially formed by arranging a plurality of nodes of the abstract syntax tree (hereinafter abbreviated as AST nodes) in order. An AST node representation is a low-level representation.
Pre-processing an AST is a mature technology, including collecting the character string literal quantity into an array, flattening conversion of control flow, adding code dead zone, adding random dead code block, converting object literal quantity and renaming object attribute, etc. As in fig. 12, the preprocessing may be implemented in particular by a preprocessor of a compiler.
For example, for the source code con.log ('hello world'), the node sequence P:P=[debugger,ifJump, typeof,debugger,params,'=',RegExpLiteral,debugger,typeof,get,RegExpLiteral,tryNumber], obtained after step S101, that is, a total of 12 AST nodes, is shown in table 1 as follows:
Table 1 list of node sequences P
Regarding S103, generating an instruction mapping table;
in the embodiment of the application, the instruction mapping table is generated once after each compiling, and the generated instruction mapping table has uniqueness and can not be reused. That is, the instruction mapping table generated each time can only be used for the current compilation and cannot be used for other compilations.
The method comprises the steps of generating an instruction mapping table, wherein the generation of the instruction mapping table comprises the step of establishing a randomized mapping relation between the operation of source codes and virtual instructions.
Virtual instructions are instructions that are used only for mapping without a meaning of actual operations. For example, the virtual instruction may be a number 1,2,3, or the like, may be a letter a, b, c, ax, dge, or the like, may be any symbol $, #,% or the like that does not have an operation meaning, or may be a mixture of any of the above.
The establishment of the randomized mapping relationship can be realized by means of a pseudo-random algorithm. In one possible implementation, establishing the randomized mapping relationship between the source code operation and the virtual instruction includes performing random ordering (e.g., using a pseudo-random algorithm) on a plurality of nodes of the abstract syntax tree prepared in advance, and determining a sequence number corresponding to each node after random ordering as a virtual instruction, thereby establishing the mapping relationship between the nodes and the virtual instruction. I.e. the instruction mapping table comprises the mapping relation of the nodes of the abstract syntax tree, i.e. AST nodes, and the virtual instructions.
For example, a plurality of AST nodes prepared in advance may be placed in an array a, each element in the array a representing one AST node:
A =['tryNumber', 'RegExpLiteral', 'try','ifJump', 'params', 'typeof','=', '++', 'def_v','--', 'localScope', 'property',......];
The Fisher-Yates shuffling algorithm (as shown by the shuffleArray function below) may then be applied to shuffle the order of the elements in array A:
function shuffleArray(array: Array<string>): Array<string>{
for (let i = array.length - 1; i>0; i--) {
const j = Math.floor(Math.random() * (i + 1))
;[array[i], array[j]]= [array[j], array[i]]
}
return array
}
inputting the array A into the function parameters, and obtaining the array A1 after shuffling, for example:
A1=['new','globalScope','switchJump','ifJump','>>>','typeof','>>', '++', 'def_v','--','<<','property',......];
the sequence number corresponding to each element in A1 is a virtual instruction, and the elements and sequence numbers in A1 are sorted into the following table 2, namely an instruction mapping table MIT-1 is obtained:
TABLE 2 instruction mapping table MIT-1
And when compiling next time, the Fisher-Yates shuffling algorithm is also applied to disturb the sequence of elements in the array A, so that the method can be used for obtaining:
A2=['tryNumber','RegExpLiteral','try','ifJump','params','typeof','=','++','def_v','--','localScope','property','debugger',......];
the sequence number corresponding to the element in A2 is a virtual instruction, and the element and the sequence number in A2 are arranged into the following table 3 to obtain an instruction mapping table MIT-2:
TABLE 3 instruction map MIT-2
It can be seen that the instruction mapping table generated each time is different, that is, the virtual instruction corresponding to the AST node is not fixed, and the ordering sequence number (virtual instruction) corresponding to the element (AST node) in the array a is shuffled in each compiling process. Therefore, the mapping relation is established by adopting the pseudo-random ordering algorithm to the AST nodes, so that an attacker is effectively prevented from restoring the mapping relation by statically analyzing the abstract syntax tree nodes, and the reverse analysis difficulty is further improved.
Regarding S105, compiling the source code into an object code containing virtual instructions according to the current instruction mapping table;
Specifically, compiling the source code into an object code containing virtual instructions according to a current instruction mapping table comprises the steps of inquiring the instruction mapping table, and converting nodes in a node sequence into corresponding virtual instructions so as to obtain the object code containing the virtual instructions.
It should be noted that, the source code may be completely or partially transcoded into the virtual instruction, that is, the target code may be completely or partially virtual instruction, and other or conventional processing, which are all within the scope of the present application. The whole or part of transcoding into the virtual instruction can be realized by configuring AST nodes in the virtual instruction, only AST nodes supported by the instruction mapping table can be transcoded into the virtual instruction, and AST nodes not supported by the instruction mapping table can be processed in a traditional manner. For example, the general source code includes operators (e.g., add-subtract multiplier-divide) and operators (e.g., for-loop, delete operation, etc.), and in some embodiments, all operators and operators may be covered by the AST node supported by the instruction map, and in one embodiment, only operators need to be covered by the AST node supported by the instruction map, i.e., sufficient to achieve the desired aliasing effect.
For example, suppose that in the current compilation, the node sequence obtained in step S101 is P, i.e., table 1. The instruction map generated in step S103 is MIT-2, table 3. Referring to tables 1 and 3, a series of virtual instructions (indicated by instrucitions) may be obtained as shown in Table 4 below:
Table 4 example table of AST node conversion virtual instruction obtained in step S101
It can be seen that, the node sequence in step S101 (table 1) belongs to all AST nodes supported by the instruction mapping table MIT-2 (table 3), and a series of virtual instructions in the second column in table 4 can be obtained by transcoding the node sequence in step S101 against the instruction mapping table MIT-2, that is, the target code in step S105 is all composed of virtual instructions, and the target code corresponding to the finally obtained source code log ('hello world') is a sequence composed of 12 virtual instructions, which is represented as an array as follows:
[12, 3, 5, 12, 4, 6, 1, 12, 5, 30, 1, 0];
it can be seen that the virtual instruction in the object code is a series of digits (other characters are possible in other embodiments) without any operation information, and even if the object code is obtained, the source code cannot be decoded.
It will be appreciated that the above transcoding process of the present application also supports the source code of MD5, SHA256, etc. type signature algorithms. The transcoding process of the present application achieves a dynamic obfuscation effect, as shown in fig. 12, which is implemented by the obfuscator of the compiler. The compiler needs to maintain the same functions as other compilers, such as index pools, operators, etc., except for the maintenance of the instruction mapping table mentioned above, and this part is the same as the conventional compiler, and will not be repeated.
Regarding S107, generating an interpreter;
the interpreter is matched with the current instruction mapping table and is used for analyzing the target code generated by current compiling and analyzing the virtual instruction in the target code into a virtual machine instruction.
The virtual machine instruction is an instruction which can be identified and executed by the JavaScript engine and comprises a stack operation instruction, a heap operation instruction, a register operation instruction, a control flow instruction, an operation instruction and an environment interaction instruction.
Referring to fig. 2, in one possible implementation, an interpreter is generated comprising the steps S201-S203:
S201, inquiring a preset instruction set to obtain virtual machine instructions corresponding to virtual instructions according to each virtual instruction in an instruction mapping table;
The preset instruction set comprises a plurality of nodes (completely consistent with the nodes of the instruction mapping table) prepared in advance and a plurality of virtual machine instructions corresponding to the nodes.
For example, in the preset instruction set, the virtual machine instruction corresponding to the AST node tryNumber is:
t0 = stack.pop();
t0 = Number(t0);
if (isNaN(t0)) t0 = -1;
stack.push(t0);
For the virtual instruction in the instruction mapping table, the corresponding AST node is tryNumber, and the virtual machine instruction above can be found according to the positioning of the preset instruction set to the AST node tryNumber.
S203, establishing a piece of analysis logic between each virtual instruction and the corresponding virtual machine instruction.
The analysis logic may be if statement, case statement, etc., the condition of if and case is the specific value of the virtual instruction, and the execution logic when the condition of if and case is satisfied is the virtual machine instruction. The example of S201 above may build a piece of parsing logic as follows:
if (instruction == 0) {
t0 = stack.pop();
t0 = Number(t0);
if (isNaN(t0)) t0 = -1;
stack.push(t0);
}
Similarly, for each other virtual instruction in the instruction mapping table, according to the AST node corresponding to the virtual instruction in the instruction mapping table, the virtual machine instruction corresponding to the AST node in the preset instruction set can be queried, and the analysis logic can be established.
Therefore, based on the instruction mapping table and the preset instruction set, the analysis logic between the virtual instruction and the virtual machine instruction can be established in the interpreter, so that the interpreter can quickly convert the virtual instruction of the target code into the executable virtual machine instruction after the front-end loading of the interpreter is facilitated.
In one possible implementation, referring to fig. 3, generating the interpreter further includes:
s301, establishing environment monitoring logic, wherein the environment monitoring logic is used for judging whether the attribute of the host environment of the interpreter belongs to the characteristic attribute of the automatic framework, and if so, throwing an error.
The environment monitoring logic is in the same JavaScript document as the parsing logic, and the environment monitoring logic precedes the parsing logic. The environment monitoring logic is used for checking whether the window and the document object corresponding to the host environment have the attribute set by the automation frame (including headless browser, automation debugging tool, robot or simulation environment, etc.), and specifically can classify whether the host environment belongs to different automation frames by scanning specific character strings or regular expressions. For example, an attribute mapping table may be constructed, where the attribute mapping table includes various possible automation frame types and attributes corresponding to each type, the environment monitoring logic traverses each automation frame type in the attribute mapping table, and according to the currently traversed automation frame type, checks whether any attribute (such as awesomium, cefSharp, __ nightmare or a matching regular expression) corresponding to the currently traversed automation frame type appears in the window and the document object corresponding to the host environment, and if so, determines and marks the host environment as the automation frame type. For example, if the CefSharp attribute is found in the window object, it identifies a possible CefSharp environment.
In this manner, the environment monitoring logic detects different network automation frameworks by matching known attribute names that are unique to each technology, and may return an object in which each automation framework type corresponds to a boolean value describing whether any unique attributes are identified. Once the running environment of the code is monitored to come from a certain automation framework, an exception is thrown, so that the execution of an interpreter in an abnormal browser environment (such as a debugger or an automation tool) can be effectively avoided, the code is prevented from being maliciously debugged or analyzed, and the safety of the code is further improved.
The interpreter needs to maintain the same functions as other conventional interpreters, such as its own scope, constant pool, global variables, etc., except for the above-mentioned parsing logic, environment monitoring logic, which is similar to the conventional interpreter generation in fig. 12, and thus is not expanded.
With respect to S109, the interpreter and the object code are output.
In one possible implementation, outputting the interpreter and the target code includes outputting the interpreter and the target code as a JavaScript file after confusion processing.
And combining the interpreter code to be output and the target code, and then performing confusion processing to finally generate a JS file. Specifically, the object code may be attached to the interpreter code first, and then the merged whole code may be subjected to a obfuscation process. The obfuscation approach may employ conventional code protection means such as code compression, variable name and function name obfuscation, dead code injection, etc. These measures ensure that the code can still be correctly recognized and executed by the virtual machine or the operating environment after confusion, while preventing the direct retrieval and analysis of the interpreter code and object code by the black-out. Finally, the code after confusion processing is exported as a JS file, and the JS file can be uploaded to a server, and the JS file is issued to a browser of a client side by the server.
Therefore, on the basis of realizing dynamic instruction confusion, static protection is realized by further carrying out confusion processing on the interpreter and the target code, and on the basis of the dual defense system of dynamic confusion and static protection, the geometric grade improvement of the security strength is realized, the deduction of the mapping relation of the instruction set is destroyed, and the possibility of forging the custom interpreter by a hacker is fundamentally avoided.
Referring to FIG. 4, in some embodiments, the method further includes, after step S101, step S401, subjecting the source code to syntax downgrading.
As one example, a high version JavaScript (e.g., ES 6+) grammar may be converted to a low version (e.g., ES 5) grammar, specifically using the Babel tool for transcoding, designating the target environment as ES5. For example, converting const/let to var, arrow function to normal function, template string to string splice, etc.
Because the degraded code reduces grammar variation, the subsequent AST node types are more unified, so that the generation logic of the instruction mapping table is simplified, and the instruction set scale of the instruction mapping table is reduced. For example, the class of ES6 would be downgraded to prototype syntax, avoiding additional processing of class related instructions by the virtual machine. The instruction is simplified, and the compiling speed is improved. Code degradation can further enhance the obfuscation effect because the degraded code has lost high-version grammatical features (e.g., simplicity of arrow functions), and further increases the difficulty of manual analysis in combination with subsequent obfuscation steps, blocking AST restoration attacks, and an attacker cannot infer source code logic by restoring high-version grammatical features (e.g., reversing prototype code to class). In short, the embodiment degrades the grammar of the source code, realizes the simplification of the instruction set through the grammar level compression, achieves the promotion of the compiling speed, reduces the volume of an interpreter and enhances the confusion effect.
In summary, the following beneficial effects exist in the embodiment of the present application:
1) The dynamic confusion defense system based on the instruction mapping table is characterized in that the instruction mapping table is generated through a pseudo-random algorithm to realize dynamic instruction confusion, a code structure which is difficult to predict through a history code is constructed, the reverse engineering difficulty is remarkably improved, and static decompilation analysis is effectively prevented. In addition, the confusion process belongs to pure JavaScript confusion, the virtual instruction is not a machine code, no operation significance exists, the virtual instruction is only needed to be interpreted as a virtual machine instruction by an interpreter, the compiled product is a JS file suitable for running in a pure JS environment, the compatibility of a low-version browser can be realized in the product, and the problems of incompatibility of the low-version browser and obvious calling characteristics existing in a WebAssembly protection detection scheme are solved.
2) The instruction set double protection architecture is that codes after the confusion of dynamic instructions are mixed with an interpreter secondarily, the deduction of the mapping relation of the instruction set is destroyed, and the possibility of forging a custom interpreter by a hacker is fundamentally avoided.
3) And the environment-aware anti-debugging mechanism is characterized in that environment monitoring logic is implanted into an interpreter, the debugging process is actively interfered in a mode of triggering and throwing out an abnormality by randomizing an abnormal environment, the time cost of dynamic analysis is greatly increased, and the mirror image cloning attack path of the virtual machine is blocked.
4) Compiling an optimized confusion paradigm, namely designing JS grammar dimensionality reduction conversion, and combining with an instruction dynamic confusion technology, simplifying an instruction set through grammar level compression, so as to achieve the effects of improving compiling speed and reducing the volume of an interpreter.
In the embodiment of the application, a four-in-one defense matrix of dynamic confusion, instruction protection, environment perception, compiling and optimizing is constructed, and the balance breakthrough of the safety protection intensity and the execution efficiency is realized on the premise of keeping the code equivalence.
It is noted that the present specification provides method operational steps as an example or a flowchart, but may include more or fewer operational steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. In practice, the method programs may be executed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment) in accordance with the methods shown in the embodiments or figures.
Based on the same technical concept, referring to fig. 5, in a second aspect, the present application provides a code processing method, where the method is executed by a client, and may be implemented by loading and executing JavaScript files by JavaScript engines (e.g. V8, webkit) of various front-end web pages, applications, etc. of the client. Referring to fig. 5, the method includes steps S501-S503:
s501, acquiring an interpreter and target codes;
And S503, executing an analysis result which is obtained by analyzing the target code by the interpreter and contains virtual machine instructions so as to realize the function corresponding to the source code.
The interpreter matches the instruction mapping table. The instruction mapping table and the interpreter are generated once in each compiling process of the JavaScript source code, and each generated instruction mapping table has uniqueness. The target code comprises a virtual instruction, and the source code is compiled according to an instruction mapping table, and the virtual instruction is only used for mapping without actual operation meaning.
The instruction mapping table carries a randomized mapping relation between the operation of the source code and the virtual instruction. The mapping relation is randomized, a plurality of nodes of the abstract syntax tree are randomly ordered, and the sequence number of each node after the random ordering is determined to be a virtual instruction, so that the mapping relation between the node and the virtual instruction is established. In some embodiments, the instruction mapping table includes a mapping relationship between nodes of the abstract syntax tree and virtual instructions, and the object code is obtained by querying the instruction mapping table to convert the nodes in the node sequence into the corresponding virtual instructions. The node sequence is formed by arranging a plurality of nodes of the abstract syntax tree of the source code in sequence after pretreatment.
It will be appreciated that the interpreter and the object code are output by the method embodiments of the first aspect, so that further details may refer to the method steps of the embodiments of the development terminal, and are not described herein.
Referring to FIG. 6, in some embodiments, the interpreter parsing the object code referred to in S503 specifically includes steps S601-S603:
S601, reading target codes by an interpreter according to the sequence;
specifically, a JavaScript file issued by a server is obtained, and the content in the JavaScript file is obtained after confusion processing is performed by an interpreter and an object code. Confusion of interpreters and target codes belongs to code confusion, and the confusion does not affect the recognition of a JavaScript engine.
And S603, when the virtual instruction is read, positioning to the analysis logic corresponding to the virtual instruction, so as to obtain the virtual machine instruction in the analysis logic as the analysis result of the currently read virtual instruction.
For example, in the example of step S105, the resulting object code is a sequence of 12 virtual instructions [12, 3, 5, 12, 4, 6, 1, 12, 5, 30, 1, 0], which the interpreter would read sequentially.
1) The interpreter first reads the virtual instruction "12", then the interpretation will go back to the following parsing logic:
if (instruction == 12) {
debugger;
}
Resulting in a virtual machine instruction of "debugger".
2) The interpreter continues to read back the virtual instruction "3", and the interpretation will go back to the following parsing logic:
if (instruction == 3) {
t0 = stack.pop();
t1 = bytecode[index++];
if (!t0) index = t1;
}
The resulting virtual machine instruction is:
t0 = stack.pop();
t1 = bytecode[index++];
if (!t0) index = t1;
And so on, finally, a series of virtual machine instructions can be obtained, and the effect of source code control.
When the embodiment of the application is applied to a browser environment, referring to fig. 7, in some embodiments, the method further includes:
and S701, executing an environment monitoring logic after the interpreter is started, wherein the environment monitoring logic judges whether the attribute of the host environment of the interpreter belongs to the characteristic attribute of the automatic framework, and if so, throwing an error.
After the browser obtains the JS file issued by the server, a JavaScript engine in a webpage of the browser loads the JS file to start an interpreter in the JS file, and environment monitoring logic in the interpreter is started, and the environment monitoring logic checks whether the attributes on a window and a document object corresponding to a host environment belong to an automation framework or not in a mode of scanning a specific character string or a regular expression. Specifically, an attribute mapping table is maintained in the environment monitoring logic, where the attribute mapping table includes various possible automation frame types and attributes corresponding to each type (one type may correspond to a plurality of attributes), the environment monitoring logic traverses each automation frame type in the attribute mapping table, and checks whether any attribute (such as awesomium, cefSharp, nightmare or a matched regular expression) corresponding to the current automation frame type appears in a window and a document object corresponding to the host environment according to the currently traversed automation frame type, and if so, determines and marks the host environment as the automation frame type. For example, if the CefSharp attribute is found in the window object, it identifies a possible CefSharp environment. Such processing may block dynamic debugging and automation attacks.
Based on the same technical concept, in a third aspect, referring to fig. 8, an embodiment of the present application further provides a code processing apparatus, which may be specifically a compiler. The code processing apparatus includes respective modules for executing the code processing method of the above first aspect. For example, the code processing apparatus includes:
A code obtaining module 801, configured to obtain JavaScript source code to be protected;
A mapping table generating module 802, configured to generate an instruction mapping table before compiling the source code into the target code each time, where the instruction mapping table generated each time has uniqueness;
The source code compiling module 803 is configured to compile a source code into an object code including a virtual instruction according to a current instruction mapping table, where the virtual instruction is an instruction only used for mapping without an actual operation meaning;
The interpreter generating module 804 is configured to generate an interpreter, where the interpreter is matched with the current instruction mapping table, and is configured to parse a target code generated by current compiling, parse a virtual instruction in the target code into a virtual machine instruction, and the virtual machine instruction is an instruction that can be identified and executed by the JavaScript engine;
An output module 805 for outputting the interpreter and the object code.
The apparatus of the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the apparatus of each embodiment of the present application correspond to steps in the method of the embodiment of the first aspect of the present application, and detailed functional descriptions of each module of the apparatus may be referred to in the foregoing description of the method embodiment of the first aspect, which is not repeated herein.
It should be noted that in the description of the various modules herein, the modules are divided for clarity of illustration. However, in actual implementation, the boundaries of the various modules may be fuzzy. For example, any or all of the functional modules in the present application may share various hardware and/or software elements. As another example, any and/or all of the functional blocks of the present application may be implemented in whole or in part by execution of software instructions by a common processor. In addition, various software sub-modules executed by one or more processors may be shared among various software modules. Accordingly, the scope of the present application is not limited by the mandatory boundaries between the various hardware and/or software elements unless expressly required.
Based on the same technical concept, in a fourth aspect, referring to fig. 9, the embodiment of the application further provides a code processing apparatus, which may specifically be a JavaScript engine installed at the front end. The code processing apparatus comprises means for performing the second aspect or the code processing method in any of the alternative implementations of the second aspect. For example, the code processing apparatus includes:
The acquisition module 901 is used for acquiring an interpreter and an object code, wherein the interpreter is matched with the instruction mapping table, the instruction mapping table and the interpreter are generated once in each compiling process of the JavaScript source code, and the generated instruction mapping table has uniqueness;
The execution module 902 is configured to execute an analysis result obtained by analyzing the object code by the interpreter, where the analysis result includes a virtual machine instruction, so as to implement a function corresponding to the source code, where the virtual machine instruction is an instruction that can be identified and executed by the JavaScript engine.
The apparatus according to the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the apparatus according to each embodiment of the present application correspond to steps in the method according to the embodiment of the second aspect of the present application, and detailed functional descriptions of each module in the apparatus may be referred to in the foregoing description of the method embodiment of the second aspect, which is not repeated herein.
Based on the same technical concept, in a fifth aspect, referring to fig. 10, an embodiment of the present application further provides a computing device 1000, including a memory 1001 and a processor 1002, a communication module 1003, an input/output interface 1004, and other components, where connection communication between the components may be optionally implemented by using a bus 1005. The memory 1001 is for storing computer programs or instructions which, when executed by the processor 1002, implement the method steps in any of the method embodiments of the first or second aspect. It should be noted that the structure of the apparatus 1000 shown in fig. 10 is only schematic, and does not limit the apparatus to which the method provided in the embodiment of the present application is applied.
The specific entity of the computing device, which may be a server or a computer at the originating end for implementing the method steps in any of the method embodiments of the first aspect, may also be a computer at the client end for implementing the method steps in any of the method embodiments of the second aspect.
The memory 1001 may be used for storing an operating system and a computer program or instructions or the like which, when invoked by the processor 1101, implement the method of the first or second aspect of the present invention, and the memory 1001 may also store a program for implementing other functions or services. Memory 1001 includes at least one type of computer-readable storage medium including flash memory, hard disk, multimedia card, random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), magnetic disk, optical disk, and the like. In some embodiments, the computer-readable storage medium may be an internal storage unit of a computing device, such as a hard disk or memory of the computing device. In other embodiments, the computer readable storage medium may also be an external storage device of a computing device, such as a plug-in hard disk, secure Digital (SD) card, flash memory card, etc. provided on the computing device. Of course, the computer-readable storage medium may also include both internal storage units of the computing device and external storage devices. In this embodiment, the computer readable storage medium is typically used to store software installed on the computing device, such as program code of the embodiment method of the first aspect or the second aspect, and so on. Furthermore, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.
The processor 1002 is connected to the memory 1102 through a bus 1005, and executes corresponding functions by calling application programs stored in the memory 1102. And in some embodiments may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other chip. The processor 1002 is generally configured to control overall operation of the processing device, such as performing control and processing related to data interaction or communication with other subjects, and the like. In this embodiment, a processor 1002 is used to execute program code or process data stored in a memory 1001.
The computing device 1000 may connect to a network through a communication module 1003 (which may include, but is not limited to, components such as a network interface) to enable interaction of data, such as sending data to or receiving data from other devices (e.g., user terminals or servers, etc.) through the network. The communication module 1003 may include a wired network interface and/or a wireless network interface, etc., that is, the communication module may include at least one of a wired communication module or a wireless communication module.
The computing device 1000 may be connected to required input/output devices such as a keyboard, a display device, etc. through the input/output interface 1004, the device 110 itself may have a display device, and may also be externally connected to other display devices through the interface 1004. It is understood that the input/output interface 1004 may be a wired interface or a wireless interface. Depending on the actual application scenario, the device connected to the input/output interface 1004 may be a component of the device 1000, or may be an external device connected to the device 1000 when needed.
The bus 1005 used to connect the components may include a path to transfer information between the components. Bus 1005 may be a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 1005 may be classified into an address bus, a data bus, a control bus, and the like according to functions.
Based on the same technical idea, the embodiments of the present application further provide a computer readable storage medium, in which a computer program or instructions are stored, which, when executed by a processing device, implement the method steps in any of the method embodiments of the first aspect or the second aspect. For more details, reference may be made to method embodiments, which are not described here again. In this embodiment, the computer-readable storage medium may be nonvolatile or volatile. Computer-readable storage media include flash memory, hard disk, multimedia card, random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), magnetic disk, optical disk, and the like. In some embodiments, the computer-readable storage medium may be an internal storage unit of a computing device, such as a hard disk or memory of the computing device. In other embodiments, the computer readable storage medium may also be an external storage device of a computing device, such as a plug-in hard disk, secure Digital (SD) card, flash memory card, etc. provided on the computing device. Of course, the computer-readable storage medium may also include both internal storage units of the computing device and external storage devices. In this embodiment, the computer readable storage medium is typically used to store software installed on the computing device, such as program code of the embodiment method of the first aspect or the second aspect, and so on. Furthermore, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.
Based on the same technical idea, embodiments of the present application also provide a computer program product or a computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method provided by the method embodiments of the first aspect or the second aspect.
It should be noted that, the description order of the embodiments of the present application is not limited to the priority order of the embodiments. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
While the embodiments of the present application have been described above with reference to the drawings, the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many modifications may be made thereto by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the appended claims, which are to be encompassed by the present application in its spirit, equivalent to the present application described in the specification and drawings, or by direct/indirect application to other related technical fields.