Superjobs guillimin
Matières
Superjobs
A "superjob" is a job which runs on one of the normal queues and executes other jobs, submitted to a faked queue, one after the other.
It will run until the requested wallclock time is used up or until it does not find any job to execute for a certain time.
NEVER KILL A SUPERJOB !!! See below for more information.
A superjob is a very useful tool to execute post processing jobs. It will make the automatic submission of post processing jobs by the model independent of guillimin's "moods". No jobs will get lost or have to get resubmitted by hand.
How to start a "superjob"
The command to submit a superjob is "u.run_work_stream":
u.run_work_stream [-instances n] -t mseconds -cpus number_of_cpus -name stream_name -maxidle nseconds -queues q1 q2 ... qn [--] "arguments_for_ord_soumet"
Arguments_for_ord_soumet (anything found after -- will be passed verbatim to ord_soumet) may include -q, -jn, and any other relevant argument
Submission example:
u.run_work_stream -t 2592000 -cpus 1 -name superjob_1a -maxidle 36000 -queues sj1 -- -q sw -jn superjob_1a
In this case a superjob with the name superjob_1 will get submitted.
'-name' is the internal name of the superjob, '-jn' the name of the listing.
For simplicity I suggest to keep the two names the same.
Make sure to NEVER HAVE TWO SUPERJOBS WITH THE SAME NAME running. But once a superjob has finished you can submit a new one with the same name.
The superjob will get submitted for '-t 2592000' seconds (30 days) on '-cpus 1' cpu to the queue '-q sw'.
If it does not find a job to execute for '-maxidle 36000' seconds it will terminate itself.
The superjob will execute jobs which got submitted to the faked queue '-queues sj1'. You can name the faked queue anyway you want.
How to send jobs to the "faked" queue
At the moment only jobs running on 1-4 cores can get executed by a superjob. But this can easily be changed. Just let Katja or Michel know.
To have for example all jobs submitted to run on 1 core, executed by the above submitted superjob instead of being actually submitted, one has to set the environment variable:
QUEUE_1CPU=sj1@
You can export this variable in your ~/.profile.d/.batch_profile:
export QUEUE_1CPU=sj1@
The '@' at the end is very important. This tells 'soumet' that this is a faked queue and not a real one.
What will happen
Once the environment variable QUEUE_1CPU is set to 'sj1@' all jobs submitted on 1 cpu will not actually get submitted. Instead a link to them will get created in the directory:
~/.job_queues/sj1-1
A superjob "picking" from queue 'sj1' will check if there is a link in this directory. If yes, it will execute the corresponding job.
If you see the links in this directory piling up you can submit a second, third, ... superjob, executing job from the same faked queue. Just make sure to use a different name for each superjob you submit!
It does make sense to submit "extra" superjobs with a very short '-maxidle' time.
How to elegantly terminate a superjob
As mentioned above: Never kill a superjob with 'qdel'!
Every superjob has a config file:
~/.job_queues/.active_superjob_name_*.1
You can edit this file and set for example:
MaxIdle=0
As soon as there are no more jobs to be executed, the superjob will gracefully terminate itself.