135 lines (105 with data), 5.1 kB
Copyright Peter T. Breuer 2017
This is the toolchain for my prototype obfuscating (encrypting) C
compiler: HAVOC. `Obfuscating' means that all the data values at
runtime are different each time you recompile the same source, in a flat
distribution. The point of this compiler is to show that such barbarity
is not only possible, but highly plausible, indeed practical.
Encryption is currently virtual, just shown on execute with an E[..]
around the unencrypted values, but notionally all data is encrypted
and so are the constants in machine code instructions.
WHAT "HAVOC" MEANS:
Highly Abstract Very Obfuscating Collection (of C compiler, assembler,
etc). That follows the GNU gcc convention of the "c" in the acronym
being for "collection" rather than "compiler", because one also needs a
linker, assembler, etc. So the C compiler is "havoc_cc".
REQUIREMENTS:
You need haskell's ghc compiler and development environment. The
homepage is https://www.haskell.org/ghc/ and I am running v7.6.3
on debian v8.9 linux. The interpreter ("hugs") is useful,
and the parser compiler ("happy") and lexer ("alex") can be useful
if you want to fiddle with some generated files. To use the compiler
you need development libraries. See the list below.
I did "apt-get install ghc ... hugs happy".
You also need a bit of a posix environment, enough to run the Makefile
(make [v4.0], install [v8.23], sh [bash v4.3.30]; though you can easily
read it and run the Makefile commands by hand). That should mostly be
on your machine already.
I did "apt-get install make coreutils bash".
ghc: 7.6.3-21 (necessary)
libghc-mtl-dev: 2.1.2-4 (necessary)
libghc-transformers-dev: 0.3.0.0-5 (necessary)
libghc-base-dev: 4.6.0.1 (necessary)
libghc-random-dev: 1.0.1.1-3 (necessary)
libghc-parsec3-dev: 3.1.3-3 (necessary)
libghc-text-dev: 0.11.3.1-1 (necessary)
libghc-bytestring-dev: 0.10.0.2 (necessary)
libghc-time-dev: 1.4.0.1 (necessary)
libghc-array-dev: 0.4.0.1 (necessary)
libghc-deepseq-dev: 1.3.0.1 (necessary)
libghc-prim-dev: 0.3.0.0 (necessary; is wired in ghc 8.4)
libghc-integer-gmp-dev: 0.5.0.0 (necessary)
make: 4.0-8.1 (necessary)
coreutils: 8.23-4 (necessary)
bash: 4.3-11 (necessary - or just sh or ksh)
gcc: 4.9.2-2 (necessary)
cpp: 4.9.2-10 (necessary)
binutils: 2.25-5 (necessary)
libc6-dev: 2.19-18 (necessary)
libbsd-dev: 0.7.0-2 (necessary)
libgmp-dev: 6.0.0 (necessary)
libffi-dev: 3.1-2 (necessary)
hugs: 98.200609.21 (nice)
alex: 3.1.3-1 (nice)
happy: 1.19.4-1 (nice)
COMPILE:
Unpack the archive, move into the new haskell source directory ("hs")
that it made and type
make
That will make
havoc_cc -- (simple) C compiler
havoc_as -- assembler
havoc_ld -- linker
havoc_vm -- virtual machine
havoc_od -- disassembler
havoc_rc -- recompiler/obfuscator
havoc_test -- regression tests (not compiled for user)
INSTALL:
You don't need to do anything to play. Putting the executables
in /usr/local/bin may eventually be nice.
RUN:
A typical use sequence is
./havoc_cc -s test_prog.c -- compile C (.c) to assembler (.s)
./havoc_as -c test_prog.s -- assemble the assembler to object code (.o)
./havoc_ld -o test_prog test_prog.o
-- link object code into an executable
./havoc_vm test_prog -- run exectable in a virtual machine
That generates a trace of the running code. Final return from a
subroutine should be with the "answer" in register v0 (the 32-bit return
register), like this:
...
314 add v0 t0 zer E[1788011673] v0 = E[2] <-- here
328 lw ra E[-2](fp) ra = E[0]
33c cmov sp fp fp sp = E[0]
340 lw fp E[-1](sp) fp = E[0]
354 jr ra
STOP
In this case, the answer is 2 (encrypted).
You will find some things to compile (and maybe run) in the "examples"
directory. They're just my testing and development C source codes, so
apologies. Some of them are _meant_ to loop forever, and some are
_meant_ to do very little. You should be able to tell which.
C code is restricted approximately as follows:
1. Only 1-D arrays
2. No global variables
3. Function prototypes are with (), not the list of arg types.
4. No varargs/stdargs.
5. No precompiler
6. Only base type is "int" (32-bit)
7. No void return functions
8. Declarations only at the beginning of a block
9. ***** pointers are restricted by declaration to a defined array ****
int A[100];
restrict A int *ptr;
...
Overstepping array bounds is not detected but you are not meant to.
Things Will Go Wrong (TM) if you do.