Random insight of the night: every couple years, someone stands up and bemoans the fact that programming is still primarily done through the medium of text. And surely with all the power of modern graphical systems there must be a better way. But consider:

* the most powerful tool we have as humans for handling abstract concepts is language
* our brains have several hundred millenia of optimizations for processing language
* we have about 5 millenia of experimenting with ways to represent language outside our heads, using media (paper, parchment, clay, cave walls) that don't prejudice any particular form of representation at least in two dimensions
* the most wildly successful and enduring scheme we have stuck with over all that time is linear strings of symbols. Which is text.

So it is no great surprise that text is well adapted to our latest adventure in encoding and manipulating abstract concepts.

@rafial Both accurate and also misses the fact that Excel is REGULARLY misused for scientific calculations and near-programming level things since its GUI is so intuitive for doing math on things.

Like, GUI programming is HERE, we just don't want to admit it due to how embarrassing it is.

@Canageek very good point. Excel is actually the most widely used programming environment by far.

@rafial Now what we need to do is make a cheap, easy to use version of it that is designed for what scientists are using it for it. Column labels, semantic labels, faster calculations, better dealing with mid-sized data (tens of thousands of data point range), etc

@Canageek I'm wondering, given your professional leanings if you can comment on the use of "notebook" style programming systems such as Jupyter and of course Mathematica. Do you have experience with those? And if so how do they address those needs?


Thanks @urusan, I found the article interesting, and it touched on the issue how to balance the coherence of a centrally designed tool with the need for something open, inspectable, non-gatekept, and universally accessible.

PDF started its life tied to what was once a very expensive, proprietary tool set. The outside implementations that @Canageek refers to were crucial in it becoming a universally accepted format.

I think the core idea of the computational notebook is a strong one. The question for me remains if we can arrive at a point where a notebook created 5, 10, 20 or more years ago can still be read and executed without resorting to software archeology. Even old PDFs sometimes break when viewed through new apps.

@rafial @urusan Aim for longer then that. I can compile TeX documents from the 80s, and I could run ShelX files from the 60s if I wantd to.

@Canageek @urusan oh hey, I'm all for longer, eventually we'll need centuries. But I've also seen the hair pulling it can take to revive archived source code from even a couple years back, and realizing we've got to start by getting to the point where even a decade is a reliable and expected thing.

@rafial @urusan Fair, though I'd say source code is pointless and what we need is more focus on good, easy access to the raw data.

If you can't reproduce what was done from what is in the paper, you haven't described what you've done well enough, and redoing it is better then just rerunning code as a bug might have been removed between software versions, you might notice something not seen in the original, etc.

@urusan @rafial That would be my favoured approach. Raw data plus a good set of standard statistical tools. More basic analysis = less care if the EXACT toolchain is lost.

If you can't redo the analysis in alternatives then it isn't good science in the first place.

@urusan @rafial No, they've kept updating the software since then so it can use the same input files and data files. I'm reprocessing the data using the newest version of the software using the same list of reflections that was measured using optical data from wayyyy back.

The code has been through two major rewrites in that time, so I don't know how much of the original Fortran is the same, but it doesn't matter? I'm doing the calculations on the same raw data as was measured in the 60s.

There is rarely a POINT to doing so rather then growing a new crystal but I know someone that has done it (he used Crystals rather then Shelx, but he could do that as the modern input file converter works on old data just fine)

@urusan @Canageek one other thing to keep in mind is that data formats are in some ways only relevant if there is code that consumes it. Even with a standard, at the end of the day a valid PDF document is by de-facto definition, one that can be rendered by extent software. Similar with ShelX scripts. To keep the data alive, one must also keep the code alive.

@rafial @urusan No, what you need is a good description of how the data was gathered. Analysis is just processing and modeling and can be redone whenever. As long as you know enough about the data.

There are *six* programs I can think of that can process hkl data and model it (shelx, crystals, GSAS-II, Jana, olex2) so it doesn't REALLY matter which you use or if any of them are around in ten years as long as there is *A* program that can do the same type or better modeling (reading the same input file is a really good idea as well as it makes thing easy)

If a solution is physically relevant any program should be able to do the same thing.

@rafial @urusan Standardized data formats are more important then software.

Simple, standardized analysis is better then fancy, complicated work.

@rafial @urusan @Canageek And this is why all software should be written in FORTRAN-77 or COBOL.

@mdhughes @rafial @urusan I mean, that is why Shelx first major version came out in 1965 and the most recent one in 2013 (last minor revision was 2018)

I mean, modern versions of Fortran aren't any harder to write them C, which is still one of the most used programming languages in the planet, I don't see why everyone makes fun of it.

@Canageek @rafial @urusan I'm kind of not making fun of Fortran, though the last time I saw any in production it was still F-77, because F-90 changed something they relied on and was too slow; I last worked on some F-77 for the same reason ~30 years ago.

I am indeed making fun of COBOL, but it'll outlive us by thousands of years as well.

Stable languages are good… but also fossilize practices that we've improved on slightly in the many decades since.

@mdhughes @rafial @urusan Isn't Fortran-90 like three versions old now? I know I used it in 2005 because you could talk to F77 with it and we had certified hydrodynamics code in Fortran 77 that was never going to be updated due to the expense of recertifying a new piece of code

@Canageek @rafial @urusan Yes, newer Fortrans are actually useful for multithreading (F-77 can only be parallel on matrix operations, IIRC). And yet I expect F-77 to be the one that lasts forever.

@Canageek @mdhughes @urusan @rafial Ok, that's it. I need to check this ShelX thing out.

> SHELX is developed by George M. Sheldrick since the late 1960s. Important releases are SHELX76 and SHELX97. It is still developed but releases are usually after ten years of testing.
This is amazing.

@clacke @mdhughes @urusan @rafial yeah, the big worry is that George Sheldrick is getting very, very old and there are wonders if anyone will take over maintaining and improving the software when he dies. luckily it's largest competitor does have two people working on it the original author and a younger professor so it has a clear succession path.

@mdhughes @Canageek @urusan @rafial Any language that has a reasonably-sized human-readable bootstrap path from bare metal x86, 68000, Z80 or 6502 should be fine.

They don't exist. Yet. Except Forth and PicoLisp.

Also I'd add standard Scheme and standard CL to the list. You can still run R4RS Scheme code from 1991 in Racket and most (all? is there a pure R5RS implementation?) modern Schemes. CL hasn't been updated since 1994.

@urusan @clacke @mdhughes @rafial See, this is a lot of focus on getting the exact same results, which for science I think is a mistake.

You don't want the same results, you want the *best* results. If newer versions of the code use 128-bit floating point numbers instead of 64-bit, GREAT. Less rounding errors.

Its like, I can create this model in Shelx or Crystals. They don't implement things EXACTLY the same, but a good, physically relevant model should be able to be created in either. If I try and do the same thing in two sets of (reliable) software and it doesn't work in one, perhaps I'm trying to do something without physical meaning?

Like, it shouldn't matter if i use the exact same Fourier transform or do analysis in R, SAS, or Python. It should give the same results. Stop focusing on code preservation and focus on making analysis platform agnostic.

@urusan @clacke @mdhughes @rafial If you can only do your analysis on ONE specific piece of code one ONE platform how do you know your results aren't due to a bug?

Also it is going to be *helllll* for someone in 20 years. I know a grad student in physics who has to revisit some code his prof wrote when he was in grad school. On the upside it is apparently well documented. On the downside, the documentation is all in Polish as that is the profs first language and where he went to grad school, whereas the grad student only speaks English.

Now nuclear physics is a bit of an exception, but asdfljk that sounds like hell.

@clacke @mdhughes @urusan @rafial To be fair, you could also compare new code to the published results. That way you can tell they both produce the same results (or close enough) on the range it has been published on.

@clacke @Canageek @urusan @rafial And in a thousand years, will Polish still exist in any recognizable form? So now you've got two archaeology problems.

At least keep your language spec with the code so there's some Rosetta Stone.

@Canageek @mdhughes @urusan @rafial You want to first know that you are getting the exact same results in the part of the analysis that is supposed to be deterministic. *Then* you can upgrade things and see differences and identify whether any changes are because you broke something or because the new setup is better.

If the original setup had bugs, you want to know that, and you want to know why, and you won't be able to do that if you can't reproduce the results.

@urusan @clacke @mdhughes @rafial Yeah, but aren't compilers for F77 and ANSI C still being made for everything under the sun?

Sheldrick has said the reason his code has been so easy to port to everything is that he only used a minimal subset of Fortran when he wrote it.

I'm interested in how things like Fortran and C and LaTeX have stayed so popular and usable after so long. I wanted to read the Nethack 1.0 guidebook and it came as a .tex file, so I just rand pdflatex on it and boom, usable PDF, something like 30 years after that with no fuss. And yet try opening ANY OTHER file format from the 90s.

@urusan @clacke @mdhughes @rafial Exactly. So science can piggy back off of them while waiting for high level work, but no one seems to as demonstrated by how many versions of Python I have installed.

@urusan @clacke @mdhughes @rafial That is fair, I'm from an area of science where you don't go into other people's work like that very often. We are far more likely to remake a compound and do all the measurements over again then we are to try and figure out what someone else did wrong.

If we find a difference between our results and the published ones the older ones probably had an impurity or something and it isn't really worth worrying about. Heck, sometimes you even get COLOUR differences when you make literature compounds, like white crystals vs red crystals.

@urusan @clacke @mdhughes @rafial (Or just use C or Fortran. They are the first languages ported to any platform ever made, and if you don't have any external dependencies they'll run on *anything* *forever*. That is how Shelx works, the writer calls it the zero dependency philosophy.

@Canageek @mdhughes @urusan @rafial It is a good philosophy. But C and Fortran are low-level languages. If you can afford to take on *one* dependency that offers a high-level language, the code will be better to review, share and modify.

It's too bad e.g. Julia has ~40 dependencies listed at github.com/JuliaLang/julia/blo… . Maybe making e.g. Julia lighter and more bootstrappable would be a good investment for future science.

@clacke @mdhughes @urusan @rafial Right, but high level languages tend to change a lot over time *looks at Python where our lab computers have three or four versions installed, one each for Bruker, PyMol, GSAS-II, and the latest one for everything else.

@Canageek @mdhughes @urusan @rafial New and popular languages change a lot over time.

If people valued longevity and reproducibility I think there would be a market for something more stable. Something based on a proven standard, maybe with something like a scipy/Julia/R subset on top, only the most boring functions that everyone uses and the use cases and best-practice interfaces for which are well known since the 90s or longer.

I don't know what people do with these things, but matrix multiplication, certain statistics operations, etc.

Or did I just describe the Fortran ecosystem? 😉

@urusan @clacke @mdhughes @rafial I've been tempted to stop teaching myself Python and learn something more stable like Lua instead but everyone else is using python, but it gets more painful to use every year.

I used to just download an exe of Pymol and run an installer and now I need to use some garbage called pip and heaven help you if you use the wrong set of install instructions or run pip instead of pip3 or vis versa.

Then there is the crystallography software that hasn't updated its install instructions since 1999 and you have to manually add a bunch of stuff to the PATH, and manually tell it where your webbrowser, Pov-ray, Infranview and text editor executables are, but I'm confident it will still work next year.

@urusan @Canageek @mdhughes @rafial There will definitely be emulators for these CPUs on any future mainstream CPU. I wouldn't want to run it that way, but if I'm in a pinch and need to get this ancient language going, I'll be able to.

The bootstrap is something that can be verified to exist. Whether the spec of the language is sufficiently good or not is more subjective. Ideally the language would have both.
Sign in to participate in the conversation

A bunch of technomancers in the fediverse. This arcology is for all who wash up upon it's digital shore.