Why is Copilot so bad?

Posted Jul 4, 2022 11:47 UTC (Mon) by bluca (subscriber, #118303)
In reply to: Why is Copilot so bad? by LtWorf
Parent article: Software Freedom Conservancy: Give Up GitHub: The Time Has Come!

> I seem to understand that you work for Microsoft. However most of us don't, so we can speak our minds more freely, since we are not afraid of getting fired :)

Ahah if that were an actual issue, I'd have been fired a long time ago, I can assure you

> (and have an economical interest in claiming so)

I am not in GH and I am not a shareholder, so you can park this nonsensical tinfoil-hattery straight away - I am simply a free software developer and a happy user of Copilot for a year, unlike the vast majority of commentators here who have obviously never seen it outside of a couple of memes I might add.

> To respond to your comment, no, having your license terms respected is not "bleak".

It would be incredibly bleak, as nobody outside of a few major corporations would ever be able to build AI/ML software beside some boring indexing or suchlike software, as it would be de-facto impossible to compile a legal training corpus unless you have a metric ton of private code available to you. That would be dreadful, and I am happy the law is going in a different direction and the original license is irrelevant for AI training, as it's better for everyone.

> Microsoft would be very free to train copilot on their internal code but didn't… don't you find that interesting? Instead they chose to build copilot on other people's works, which are indeed copyrighted.

It's not interesting at all, in fact it's quite boring and obvious - it is trained on plenty of MSFT's own code, that is to say all of which is publicly available Github (there's loads), as the team has said multiple times in public, because that's where the training data comes from. If it's on different systems (external or internal), it wasn't used, it's as simple as that - I don't even know if the GH org can access other systems, but from my own experience, I'm pretty sure they cannot even if they wanted to.

> The law to train a ML model doesn't say anything about using that model to generate new content.

Lawmakers were clearly and openly talking about AI applications, and not just some indexing applications or some such activities. A giant trunk of AI r&d is in the field of generating content, like GPT and so on. It seems like a bold assumption to think that the lawmakers weren't aware of all that.

to post comments

Why is Copilot so bad?

Posted Jul 4, 2022 13:03 UTC (Mon) by LtWorf (subscriber, #124958) [Link] (5 responses)

> Ahah if that were an actual issue, I'd have been fired a long time ago, I can assure you

You claim that, but here you are with 26 comments defending microsoft's actions.

> I am not in GH and I am not a shareholder

I'm sure you vested or will vest stocks. It's common practice. And you do get a salary I hope?

> I am simply a free software developer and a happy user of Copilot for a year, unlike the vast majority of commentators here who have obviously never seen it outside of a couple of memes I might add.

Most people would give it a try but getting it to work is non trivial (using a specific proprietary editor, setting up a vm to isolate said editor, giving up the credit card number). So it's not like it's easy to test and form an opinion.

> It would be incredibly bleak, as nobody outside of a few major corporations would ever be able to build AI/ML software

Uhm… Microsoft is a major corporation building an AI/ML software violating the licenses of probably millions of smaller fishes. It's happening now.

> That would be dreadful

It is dreadful indeed. I'm not sure why you are considering microsoft to be this little innocent startup company.

> and I am happy the law is going in a different direction and the original license is irrelevant for AI training, as it's better for everyone.

That's your personal opinion that you keep repeating but there is no agreement. And in this case it is not better for the authors, as you can see by the fact that the authors are indeed complaining.

> It's not interesting at all, in fact it's quite boring and obvious - it is trained on plenty of MSFT's own code

The open source one… not the proprietary one… Be intellectually honest please. I talked about proprietary code and you replied something entirely OT.

> If it's on different systems (external or internal), it wasn't used

And why is that? Why microsoft didn't use its own internal git repos for training? I'm sure there is a lot of code there… is there some fear about the license of the output perhaps?

> Lawmakers were clearly and openly talking about AI applications

Generating code is not the only ML application that can exist. Classifiers are ML.

I'm sure the lawmakers were aware, and that's why they talked about "training data" but not about "spitting out the training data verbatim".

You are reading what you would like to be written rather than what is actually written.

Why is Copilot so bad?

Posted Jul 4, 2022 18:16 UTC (Mon) by bluca (subscriber, #118303) [Link] (4 responses)

> You claim that, but here you are with 26 comments defending microsoft's actions.

And...?

> Most people would give it a try but getting it to work is non trivial (using a specific proprietary editor, setting up a vm to isolate said editor, giving up the credit card number). So it's not like it's easy to test and form an opinion.

You forgot hand-carving a new silicon behind a blast door in an hazmat suit. Also TIL that Neovim is a proprietary editor. And there's no need for credit cards if you are an open source maintainer, you get it for free.

> Uhm… Microsoft is a major corporation building an AI/ML software violating the licenses of probably millions of smaller fishes. It's happening now.

You are both failing to see the point (major corporations would be fine if the law worked like the maximalists wanted it to, it's the rest that would be worse off) and also talking nonsense, there is no license violation anywhere. Feel free to point to the court cases if not. Just because a few trolls and edgy teenagers shout "violation!" it doesn't mean it's actually happening, you need to prove it. Can you?

> And in this case it is not better for the authors, as you can see by the fact that the authors are indeed complaining.

The fact that some are complaining doesn't mean the alternative if the law was different would be better. There's plenty of anti-vaxxers complaining about the vaccination programs wordwide, it doesn't mean we'd be better off without vaccines.

> And why is that? Why microsoft didn't use its own internal git repos for training? I'm sure there is a lot of code there… is there some fear about the license of the output perhaps?

It's because of the aliens trapped in those repos, duh! Now if you take off your tin foil hat for a moment and go read other replies, I've already given my uninformed guess on why only public repos on Github are used.

> You are reading what you would like to be written rather than what is actually written.

I'm not the one claiming that training a model violates copyright when it's explicitly allowed by law.

Can we stop here?

Posted Jul 4, 2022 18:30 UTC (Mon) by corbet (editor, #1) [Link] (3 responses)

I'm thinking that perhaps this particular subthread has gone as far as it needs to; let's stop it here.

Thank you.

Can we stop here?

Posted Jul 5, 2022 14:35 UTC (Tue) by nye (subscriber, #51576) [Link] (2 responses)

It reflects badly on you that you post this as a reply to someone responding to repeated baseless personal attacks.

Can we stop here?

Posted Jul 5, 2022 14:52 UTC (Tue) by corbet (editor, #1) [Link]

Perhaps you have the time to watch an out-of-control comment thread - on a holiday - to find the perfect point at which to intervene. I apologize, but I lack that time.

Can we stop here?

Posted Jul 6, 2022 9:34 UTC (Wed) by sdalley (subscriber, #18550) [Link]

Well, you can equally argue that the thread stopping where it did allowed the correct person to have the last word.

But why argue at all. C'mon now, let's give Jon the respect he's entitled to as owner of this site...