Follow

Reading some github copilot discourse and trying to decide if the thoughts & feelings I have about it are Upton-Sinclair-flavored based on my salary. Also shiny toy. Also I think it is the thin edge of the wedge of future ML trends and not a flash-in-pan like "web3"

I would feel a lot less conflicted about these ML tools if they were way more legible. Like, give me a sidepane annotating the sources from which a suggestion was derived.

Then, I could make a better decision about context. Like, when I google for an algorithm, I can see in the surrounding page whether this is a pretty common construct or whether I'm about to copy/paste from significantly complex code tainted by Oracle.

And also I chafe at the "they're selling GPL code for $10/month!" argument.

Like, no, github is charging to maintain APIs for IDEs and for the trained model that takes lots of compute to construct. IANAL, but that smells transformative to me?

At worst to me it smells derivative like a search index - which folks seem mostly okay with? I mean, at least Github wants to charge money rather than surveil like Google?

Sure, with a carefully constructed query, the system will recite a search result - i.e. it'll spit out John Carmack's fast inverse square root code. But, so will google.

Personally, I'm fine if bots snarf up my 300+ github repositories. They're garbage. That said, opt-out is stinky web2.0 move-fast-break-things thinking. This stuff could use a robots.txt convention to take ALLOW / DENY signals during training.

What makes this feel like less of an Upton-Sinclair-flavored conflict to me is that I'd be interested in exploring work with these machine learning tools whether or not I worked at github.

To be clear, I do not myself work on copilot. But, that github is building this stuff was an attractor. So, at least, I think I come to the brain damage honestly?

Might be shiny toy brain worms, but dang it feels exciting to consider the creative potential of rubber ducking and riffing with machines. Especially as someone with a lot of social anxiety who's exhausted by pairing with humans.

I realize the machine output is kind of pairing-with-humans at an async remove, but that takes the sting out of it. That's the core of my attachment to it, as probably my entire lifelong attachment to computers has also been. 😅

Maybe I'll extract all this out into a blog post that no one will read. Or that will get posted somewhere for discourse dogpile target practice

@lmorchard I'm really interested in the outcome when rights owners start suing over models trained on their copyrighted works.

On one hand it's clearly derivative. On the other, it also hits some of the fair use notes (transformative, isn't competing with the original market)

@george Yeah, like I kind of want to see it dragged through the courts so we can get it worked out ASAP. Don't want to be a lightning rod for the discourse, but I've got contrary notions.

Like, for one, it doesn't have any agency to commit code or even save a file - I do that.

If it spits out literal copypasta, it's up to me to decide whether or not to incorporate the suggestion. Beyond it being a much better tool, I'm having trouble seeing an essential distinction between that and how I use stackoverflow or plain vanilla code search.

(I say "having trouble seeing" because I'm open / eager to be convinced otherwise)

@lmorchard @george

None of the examples that I've seen for copilot thus far seem really compelling to me... it doesn't seem to rise above solving leet code problems or undergraduate programming problems.

Solving clearly defined, side-effect free functions is not a hard problem that I need a tool for... understanding the relationships and behaviors of large piles of legacy spagheti code with many undocumented dependencies and strange infrastructure edge cases is; designing a whole module of new classes and deciding how they model the domain is.... writing the code itself is, if anything boring, and must be understood a-priori or else the design is going to devolve into more spagheti.

@lordbowlich @george Having used it daily since February, really I think the sweet spot for it now is for jumping ahead with drudgery.

Like, I start to write a comment and the function signature for a unit test and startlingly often it will fill in a complete & correct function body. When it does that a half-dozen times, it's saved me a half-hour of boilerplate annoyance.

@lmorchard the problem with the transformative argument is that there's no transparency. probably most of it will be transformative, and a lot of it won't be, and it's impossible to tell the difference because like you said it strips away context.

@technomancy Yeah, that's troubling to me. I would like more legibility. In my mind, ultimately this thing is a tool contributing to my work. It's making suggestions but I make the decisions.

Currently I kind of patch over that by only really accepting small bits of help (e.g. unit tests) or giving the code as good a hard stare as I would something I found over on stackoverflow.

@george (it didn't really understand the assignment, though)

@lmorchard I'm wondering when the first lawsuit will come. "Can't blame me for this code that caused someone's death, the AI wrote it."

@mossop Yeah, I'd like to see that kind of lawsuit and would hope they lose horribly.

A big thing I'm taking from all this is that the system has no agency. It's generating predictive text from my input and it's up to me whether or not to accept it.

@lmorchard this is an interesting perspective, thanks for sharing it.

I would imagine the attribution would be a mess. I'm not sure it's possible to draw a line from a particular input to a particular output when using machine learning. Even if you could it might be something like 1/10,000 from source A, 7/100,000 from source B, on and on for thousands of inputs.

@bcgoss I'm less versed in the ML algorithms than I'd like, but I'd like to imagine legible attribution might get easier the more specific the code suggestion gets.

Which could also be a warning sign that it's "filing the serial numbers off" something specific and to decide not to accept the suggestion.

That said, a legible process seems like a non-trivial change from the ground up

@lmorchard I've made neural nets as a hobbyist, and one attempt at a k-nearest neighbor classifier for work. In both, the algo has a complex set of weighted equations. The weights are tuned to the "right" value using "reinforcement". Given an input and an expected output, weights are changed one direction if the actual output matches the expected output, another direction if it's different.

In the end all you have are the weights and the algorithm, no real connection to the inputs.

Sign in to participate in the conversation
hackers.town

A bunch of technomancers in the fediverse. This arcology is for all who wash up upon it's digital shore.