Now the problem is not that these people ask for difficult things. Mostly what they want turns out to be fairly trivial, if you can ever work out what it is that they want. The problem is not even that they tend to be asking for scripts to do things that you just shouldn't want to do. That's manageable: ours not to reason why, ours just to look faintly puzzled and say "But that's what you asked for" when it does the wrong thing. No, the real problem is that these people don't just move the goalposts, they run around like headless chickens with the bloody goalposts strapped to their backs.
"Can you write a script to match these SGML records against these other records, by ID number?"
"Well, the first lot of records are ... er ... here. There's one record per file, so that's about 200,000 files."
"The second lot are here. They're all in one file, one citation per line, so that's about 25,000 lines. They're in a peculiar pseudo-MARC format."
"Okay. And there's an ID in the first lot that matches an ID in the second lot?"
"We-ell... sort of. There's this field
"The field ends when you get to the next Ë."
"And this field is in every record."
"Yeah. Er, except when it isn't."
"Sometimes you have to just match the title, which should be nearly the same in both."
"And if the title doesn't quite match, you'll need the routine to check the author as well, you know, to make sure. Though the name forms might be different."
"And if they do match, or if they sort of match, you probably want to check the date as well."
"The dates in this lot are in a standard 8-digit thing."
"And the dates in the other lot...?"
"And the output, I mean the records which might match, can they be sorted in order of, like, how likely it is they match?"
"Well, you'll have to define what that means."
"Yeah. Well, it doesn't really matter, it's just, like, if we could add up how many fields match, and how much they match, and then sort them in order of that."
"I'll ... see what I can do."
"Oh, and, I meant to ask... it would be really helpful if before doing this, you could write a script to sort the original records into separate files by journal title."
"There's an ID number which corresponds to the journal."
"Or, actually, if you could sort them into files by journal title and issue number, that would be good."
"Right. Will they all have an issue number?"
"Er ... no."
"Will they all have a journal ID number?"
"Er ... probably."
"Right. So, 200,000 files sorted --"
"-- oh, sorted by publication date within the files, yeah, please."
"... sorted into separate files by journal title and issue number, within which they're sorted by date."
"Yeah, because that way it'll make it much easier for the freelancers to start matching records by hand while you're writing this script."
Back in again today, so that I can get that testing done; the dev server is now fixed, but Magic Geoff has to fix all the other stuff that needs fixing before anybody can test the alpha, which he couldn't do yesterday because the server was down... so I still have nothing officially to do, which means people still keep asking me for crap perl scripts.
This job is even more dead-end than it was before I "quit". It's not even adding anything to my CV any more, except time. ("Except my life. Except my life. Except my life.")
"What is there in the bag?"