I would feel a lot less conflicted about these ML tools if they were way more legible. Like, give me a sidepane annotating the sources from which a suggestion was derived.
Then, I could make a better decision about context. Like, when I google for an algorithm, I can see in the surrounding page whether this is a pretty common construct or whether I'm about to copy/paste from significantly complex code tainted by Oracle.
And also I chafe at the "they're selling GPL code for $10/month!" argument.
Like, no, github is charging to maintain APIs for IDEs and for the trained model that takes lots of compute to construct. IANAL, but that smells transformative to me?
At worst to me it smells derivative like a search index - which folks seem mostly okay with? I mean, at least Github wants to charge money rather than surveil like Google?
Sure, with a carefully constructed query, the system will recite a search result - i.e. it'll spit out John Carmack's fast inverse square root code. But, so will google.
Personally, I'm fine if bots snarf up my 300+ github repositories. They're garbage. That said, opt-out is stinky web2.0 move-fast-break-things thinking. This stuff could use a robots.txt convention to take ALLOW / DENY signals during training.
Maybe I'll extract all this out into a blog post that no one will read. Or that will get posted somewhere for discourse dogpile target practice
@george (it didn't really understand the assignment, though)
@lmorchard Just get GPT-3 to write the blog post for you