The scientist-programmer is in many ways a peculiar being. Most of what you do each day is programming, yet the reason for your (professional) existence is to produce good science. We think it’s worth just taking a moment to think about this.
Lots of prototyping
The nature of science means you’ll probably be doing a lot of what is effectively prototyping. The goal of the scientist-programmer is often to figure out the best way to solve a particular problem, but then they often must move on to the next problem to be solved. Perhaps you need a good set of scripts to process a given data-set, or perhaps you’re trying out ideas for new statistical models. You may even just be looking for way to efficiently implement known methods on a particularly large data-set. All prototyping!
Lots of data processing
Scientists handle data. Often an awful lot of it. This is even more true for the scientist-programmer, as quite often you will have become one specifically because you have so much data that you need a computer to help you process it.
Particular skills are needed to write code than can handle large volumes of data and/or do complicated things to them. You may well have to be frugal with your memory and/or CPU resources, because (for example) an extra copy of the data will fill your computer’s RAM, making your code grind to a halt. As a result, you can end up having to think seriously about ways to optimise you code.
There may also be mathematical operations that you use a lot. Inverting a 104 x 104 matrix requires a lot of computing resources, so you need to pick a good way to do it (hint: use LAPACK, or a similar library). Or perhaps you’re taking Fourier transforms and need to know that Fast Fourier Transform (FFT) algorithms are faster when the size of your data array is 2n (where ‘n’ is an integer). These are things with which the scientist-programmer must be intimately familiar.
And of course, you must be extra-careful of the “10% error” type bugs in your numerical code. There are many places for such bugs to hide, and consider this: do you really want to be stood in front of 200 eminent scientists at a conference when the world-leader in your field spots that your graph must be off by 10%? We thought not.
“Just get the science done”
The primary tension in the life of the scientist-programmer is to “just get the science done”. What we mean by this is the conflict between getting the coding done as quickly as possible so that you can move onto finishing the science, versus spending that extra week (or however long) making sure that your code is neat, tidy, well-tested and generally a glorious triumph of software engineering.
This can be hard for a number of reasons. Firstly, you may yourself be impatient to get on with the science as that’s your ultimate aim. Fair enough. Just make sure your results will pass the above “conference test”. By contrast, you might be a bit of a coding perfectionist (we can certainly relate to that) and not want to stop improving the code until it’s perfect. We suggest that this is admirable up to a point, but that you need to know when your code’s good enough .
Finally, there’s the issue of managing your manager and/or collaborators. How do you convince them that your code needs more work, despite the fact it seems to be producing results. This is a tough one and we suspect that ultimately you have to both strike a balance and also communicate well (and regularly) with the people with whom you work. Explain that another week of testing means you can write a section in the paper proving that the code (and hence all your results) are solid. If you’ll be using the code many times, explain how two days of well-judged optimisation will save many days of run-time at later dates, so all your science will progress more quickly. If your code can really use the extra time, probably it should be possible to present a case that’s compelling to anyone who’s reasonably open-minded about it.
On this subject, beware of people making statements like “oh, it only needs *blah* doing to it.”, implying your task is small and should be quick to complete. This is often hard to refute without resorting to “you just don’t know what you’re talking about!”. Which isn’t very polite, even if it’s true :-) Try to see it from the other person’s perspective, and remember that estimating the time required for any project is very hard to do so it’s not anyone’s fault. But if you think it will take longer, say so!
We’ve talked before about surviving scientific legacy code . It’s been our experience that such code can be pretty horrific to deal with, but quite often the author will have been highly expert at the implemented method (even if their coding skills were deeply average), so it can be the case that your best course of action is to grin and bear it. Sorry.
If you’re using someone else’s legacy code, access to the original author is very helpful. It might even be vital. Sometimes you can save hours, days or even more effort by taking half an hour to sit down with the author and have them explain their thinking for a particular part of the code. Sometimes what looks like a bug or inefficient piece of code is in fact clever and subtle, so much so that you’ve not spotted what it’s doing. Try not to destroy these, if you can avoid it!
Coming back six months later
You may well put a project down for six months, then come back to it. Perhaps you’ve finally heard back from the referees on a paper and need to make some revisions. Maybe you just got sidetracked onto another project. Either way, the scientist-programmer can have many projects on the go at once, so it pays to prepare accordingly. Primarily, this means always leaving your code in a state that makes it easy to pick up again after a break. Good practice here is vital. If your code is self-documenting , then you’ll find it a lot easier to remember what it is you were doing six months ago when you last worked on the code in question. This is a case where a bit of investment of time at an early stage can save a much larger amount of time when you come to re-start a project.
Publishing your results
Your results will (hopefully) get published in a scientific paper, so it’s worth bearing in mind what you might need in order to do this. Often, the plots, graphs, tables and/or speed-trials that you might want to include in a paper are also great tests to prove that your code works, so trying to build these in as you go is a good idea. You do test your code, right?
Making software tools
The scientist-programmer is typically not paid to write code. That’s a by-product of producing science, which is what they’re paid to do. This is not to say, however, that taking the time to turn your prototyped ideas into proper software tools is pointless. Quite the opposite.
Taking the extra time to produce really awesome (publicly-available) software tools can be a really great thing to do. It helps the scientific community, because they can benefit from all your hard work in developing the method in the first place. You can also gain a good level of kudos if you do a good job. It’s a Good Thing to be well-regarded in your field and this is one way to achieve that. We also think that it’s a real shame that some really clever ideas never make it past the stage of being scripts in someone’s home directory, because they’ve written the paper and moved on. Look at something like the R programming language to see how powerful it can be when a whole community of scientists contribute software packages.
Of course you need to take responsibility for your code. Be a good maintainer. If people find bugs in the code, fix them as promptly as you can and thank them for their input; they’re only trying to be helpful and everyone benefits as a result. It’s also very useful if there’s a published paper that people can refer back to. And of course, the scientist-programmer is in the business of writing good scientific papers, so it’s win-win.
Scientist-programmers are both scientists and programmers. The best ones work very hard to be world-class at both disciplines. This isn’t easy, but modern science needs people who are expert at both. There’s a lot of enjoyment to be had from the combination, and there’s a lot of great science that you can do as a result.
No related posts.
Related posts brought to you by Yet Another Related Posts Plugin.