I would feel a lot less conflicted about these ML tools if they were way more legible. Like, give me a sidepane annotating the sources from which a suggestion was derived.
Then, I could make a better decision about context. Like, when I google for an algorithm, I can see in the surrounding page whether this is a pretty common construct or whether I'm about to copy/paste from significantly complex code tainted by Oracle.
And also I chafe at the "they're selling GPL code for $10/month!" argument.
Like, no, github is charging to maintain APIs for IDEs and for the trained model that takes lots of compute to construct. IANAL, but that smells transformative to me?
At worst to me it smells derivative like a search index - which folks seem mostly okay with? I mean, at least Github wants to charge money rather than surveil like Google?
Sure, with a carefully constructed query, the system will recite a search result - i.e. it'll spit out John Carmack's fast inverse square root code. But, so will google.
Personally, I'm fine if bots snarf up my 300+ github repositories. They're garbage. That said, opt-out is stinky web2.0 move-fast-break-things thinking. This stuff could use a robots.txt convention to take ALLOW / DENY signals during training.
What makes this feel like less of an Upton-Sinclair-flavored conflict to me is that I'd be interested in exploring work with these machine learning tools whether or not I worked at github.
To be clear, I do not myself work on copilot. But, that github is building this stuff was an attractor. So, at least, I think I come to the brain damage honestly?
Might be shiny toy brain worms, but dang it feels exciting to consider the creative potential of rubber ducking and riffing with machines. Especially as someone with a lot of social anxiety who's exhausted by pairing with humans.
I realize the machine output is kind of pairing-with-humans at an async remove, but that takes the sting out of it. That's the core of my attachment to it, as probably my entire lifelong attachment to computers has also been. 😅
@lmorchard the problem with the transformative argument is that there's no transparency. probably most of it will be transformative, and a lot of it won't be, and it's impossible to tell the difference because like you said it strips away context.
@technomancy Yeah, that's troubling to me. I would like more legibility. In my mind, ultimately this thing is a tool contributing to my work. It's making suggestions but I make the decisions.
Currently I kind of patch over that by only really accepting small bits of help (e.g. unit tests) or giving the code as good a hard stare as I would something I found over on stackoverflow.
@lmorchard I'm wondering when the first lawsuit will come. "Can't blame me for this code that caused someone's death, the AI wrote it."
@mossop Yeah, I'd like to see that kind of lawsuit and would hope they lose horribly.
A big thing I'm taking from all this is that the system has no agency. It's generating predictive text from my input and it's up to me whether or not to accept it.
@lmorchard this is an interesting perspective, thanks for sharing it.
I would imagine the attribution would be a mess. I'm not sure it's possible to draw a line from a particular input to a particular output when using machine learning. Even if you could it might be something like 1/10,000 from source A, 7/100,000 from source B, on and on for thousands of inputs.
@bcgoss I'm less versed in the ML algorithms than I'd like, but I'd like to imagine legible attribution might get easier the more specific the code suggestion gets.
Which could also be a warning sign that it's "filing the serial numbers off" something specific and to decide not to accept the suggestion.
That said, a legible process seems like a non-trivial change from the ground up
@lmorchard I've made neural nets as a hobbyist, and one attempt at a k-nearest neighbor classifier for work. In both, the algo has a complex set of weighted equations. The weights are tuned to the "right" value using "reinforcement". Given an input and an expected output, weights are changed one direction if the actual output matches the expected output, another direction if it's different.
In the end all you have are the weights and the algorithm, no real connection to the inputs.
A bunch of technomancers in the fediverse. This arcology is for all who wash up upon it's digital shore.