If you are treating inputs as code, you have bigger issues to worry about.

madsbuch · on Feb 27, 2020

What is the exact (philosophical?) distinction between data and code? When you extract the host name from email to be able to send a email, you are interpreting a string. We could call this process executing a program that outputs the host name.

roelschroeven · on Feb 27, 2020

Treating input as data (good):

    def echo():
        user_input = input('enter some text: ')
        print('your text: {}'.format(user_input))

Treating input as code (VERY BAD, DON'T DO THIS):

    def echo():
        user_input = input('enter some text: ')
        command = "print('your text: {}')".format(user_input)
        exec(command)

The second example allows the user to do all kinds unintended stuff:

    enter some text: '); print(10**2000) #
    your text:
    1000000000000000000000...

(abridged)

That would print however many zeroes the user specified, and use a whole lot of memory. With some creativity it's possible to cause lots of havoc.

jfkebwjsbx · on Feb 27, 2020

No, that is data, not code.

You don't seem to understand the distinction between running algorithms on data and taking programs as input, which is what the GP talked about.

throwaway373438 · on Feb 27, 2020

I suspect the opposite is true; the parent comment you're replying to is making subtle and insightful commentary on the nature of code and the futility of suggesting that inputs ought not be "code."

Any sufficiently complex program can be viewed as an interpreter for its inputs. Input into a calculator program is code which programs an equation. Input into a word processor is code which programs a document. Input into a video game is code which programs a real time simulation. Input into a compiler is code which programs an executable. These are all different types of executable code sequences.

jfkebwjsbx · on Feb 27, 2020

I am a theoretical computer scientist. I can appreciate the insightfulness (on the surface) of that commentary.

However, the fact that modern computers can be exploited due to architectural and engineering decisions (eg memory unsafety) does not mean a separation between code and data is not possible.

In fact, it is precisely a hot topic how to cheaply bend current practices back to that model given the rampant amount of vulnerabilities in the wild.

madsbuch · on Feb 28, 2020

My comment was not limited to the realm of hardware, ISAs and microcode. It was much more general.

If you never treat data as code, you can only do uninteresting things. My example was the email address. The instance you look into the "black box" of the string, you are starting to treat the string as an executable structure. An email address' raison d'etre is provide that; an address to send an email to. You can not do that without looking into it.

Now, from here we can discuss safe and unsafe ways of doing that. You could use string splits or what not, or you could use a parser combinator library. Doing the latter will make it easy to see that parsing and executing a program is not that different from parsing an email into an AST, (user, hostname), and then treating that as a higher order program (ie. we need to specialize with a message before we can execute it as a "send email" program).

dikei · on Feb 27, 2020

No, he made a good point. It's a matter of perspective, what is considered data in one situation, can be consider code in other situations. Hell, they even created the NX-bit to prevent malicious memory space becoming executable.

throwaway373438 · on Feb 27, 2020

Mixing code and data is an inherent attribute of von Newmann architecture computer systems.

kyralis · on Feb 27, 2020

It depends on context. The question is like asking "what's the difference between 'x + x' and '5'"; the former is 'code', an algorithm with placeholders for data; the latter is data.

In another context, though, the function 'f(x) = x + x' might itself be data. Within a given context, however, the answer is generally unambiguous: data is what is acted upon by code.

(The breaking of this distinction is one of the reasons self modifying code is Bad.)

mikekchar · on Feb 27, 2020

This reminds me of the trick of repropgramming a computer game by exploiting side effect bugs on controller sequences. There is a youtube video somewhere where somebody writes an entirely new game simply using input sequences on the controller of an existing game. There were bugs that wrote values to various memory locations and by being exceptionally clever you could write assembly code. I wish I could find a link to it...

Hasnep · on Feb 27, 2020

There's lots of examples, but a famous one is this video by Sethbling where he uses a controller as opposed to a TAS tool: https://youtu.be/hB6eY73sLV0

jfkebwjsbx · on Feb 27, 2020

Yes, it is a consequence of von Neumann archs.

No, the fact that current archs encode data and code in the same memory space and there are vulnerabilities does not mean the separation is not possible.