Superjobs guillimin : Différence entre versions

Un article de Informaticiens département des sciences de la Terre et l'atmosphère
Aller à: navigation, charcher
m
m
Ligne 23: Ligne 23:
 
In this case a superjob with the name '''superjob_1''' will get submitted. <br>''''-name'''' is the interlan name of the superjob, ''''-jn'''' the name of the listing.<br>For simplicity I suggest to keep the two names the same.<br>Make sure to '''NEVER HAVE TWO&nbsp;SUPERJOBS&nbsp;WITH&nbsp;THE&nbsp;SAME&nbsp;NAME''' running. But once a superjob has finished you can submit a new one with the same name.  
 
In this case a superjob with the name '''superjob_1''' will get submitted. <br>''''-name'''' is the interlan name of the superjob, ''''-jn'''' the name of the listing.<br>For simplicity I suggest to keep the two names the same.<br>Make sure to '''NEVER HAVE TWO&nbsp;SUPERJOBS&nbsp;WITH&nbsp;THE&nbsp;SAME&nbsp;NAME''' running. But once a superjob has finished you can submit a new one with the same name.  
  
The superjob will get submitted for ''''-t''' 2592000' seconds (30 days) on <span style="font-weight: bold;">'</span>'''-cpus''' 1'''''''''''cpu to the queue '''''''''''<b>-q</b>sw'.  
+
The superjob will get submitted for ''''-t''' 2592000' seconds (30 days) on ''''-cpus''' 1' cpu to the queue ''''-q''' sw'.  
  
 
If it does not find a job to execute for ''''-maxidle''' 36000' seconds it will terminate itself.<br>  
 
If it does not find a job to execute for ''''-maxidle''' 36000' seconds it will terminate itself.<br>  
Ligne 29: Ligne 29:
 
The superjob will execute jobs which got submitted to the faked queue ''''-queues''' sj1'. You can name the faked queue anyway you want.<br>  
 
The superjob will execute jobs which got submitted to the faked queue ''''-queues''' sj1'. You can name the faked queue anyway you want.<br>  
  
<br>  
+
<br>
  
 
== How to send jobs to the "faked" queue<br>  ==
 
== How to send jobs to the "faked" queue<br>  ==

Version depuis le 9 de novembre 2012 à 21:10

Superjobs

A "superjob" is a job which runs on one of the normal queues and executes other jobs, which got submitted to a faked queue, one after the other.
It will run until the required wallclock time is finished or until it does not find any job to execute for a certain time.

                NEVER KILL A SUPERJOB !!!               See below for more information.

A superjob is a very useful tool to execute post processing jobs. It will make the automatic submission of post processing jobs by the model independent of guillimin's "moods". No jobs will get lost or have to get resubmitted by hand.


How to start a "superjob"

The command to submit a superjob is "u.run_work_stream":

  u.run_work_stream [-instances n] -t mseconds -cpus number_of_cpus -name stream_name -maxidle nseconds -queues q1 q2 ... qn [--] "arguments_for_ord_soumet"

  Arguments_for_ord_soumet may include -q, -jn, and any other relevant argument

Submission example:

  u.run_work_stream -t 2592000 -cpus 1 -name superjob_1a -maxidle 36000 -queues sj1 -- -q sw -jn superjob_1a

In this case a superjob with the name superjob_1 will get submitted.
'-name' is the interlan name of the superjob, '-jn' the name of the listing.
For simplicity I suggest to keep the two names the same.
Make sure to NEVER HAVE TWO SUPERJOBS WITH THE SAME NAME running. But once a superjob has finished you can submit a new one with the same name.

The superjob will get submitted for '-t 2592000' seconds (30 days) on '-cpus 1' cpu to the queue '-q sw'.

If it does not find a job to execute for '-maxidle 36000' seconds it will terminate itself.

The superjob will execute jobs which got submitted to the faked queue '-queues sj1'. You can name the faked queue anyway you want.


How to send jobs to the "faked" queue

At the moment only jobs running on 1-4 cores can get executed by a superjob. But this can easily be changed. Just let me or Michel know.

To have for example all jobs, which get submitted to run on 1 core, get executed by the above submitted superjob instead of being actually submitted, one has to set the variable:

  QUEUE_1CPU=sj1@

You can export this variable in your ~/.profile.d/.batch_profile:

  export QUEUE_1CPU=sj1@

The '@' at the end is very important. This tells 'soumet' that this is a faked queue and not a real one.