Running job monitor : Différence entre versions
m (Created page with "{| width="100%" cellspacing="2" cellpadding="10" style="border: 0px none;" |- | width="50%" valign="top" style="background: none repeat scroll 0% 0% rgb(238, 255, 255); border: 0...") |
m |
||
Ligne 9: | Ligne 9: | ||
|} | |} | ||
− | A running job | + | A running job can be monitored/interrogated with a local utility called <br> |
− | u.job-monitor | + | u.job-monitor |
+ | <br> | ||
+ | There a 2 ways to activate this utility | ||
− | + | *at job submission time:<br>ord_soumet .... -prolog jobmonitor .... | |
− | |||
− | *at job | ||
*with an explicit command in the job itself<br>u.job-monitor & | *with an explicit command in the job itself<br>u.job-monitor & | ||
+ | <br> | ||
+ | '''caveat''': in the case of an MPI job the only node that will be monitored is node 0 (primary node) | ||
− | + | <br> | |
− | |||
− | |||
− | the job monitor uses 3 files found in directory $HOME/top_in_batch for each monitored job | + | the job monitor uses '''3''' files found in directory '''$HOME/top_in_batch''' for '''each''' monitored job |
− | *'''node'''_'''jobid'''.top | + | *'''node'''_'''jobid'''.top <br>refreshed every 10 seconds with the output of a top command for processes belonging to the user |
− | *'''node'''_'''jobid'''.cmd | + | *'''node'''_'''jobid'''.cmd <br>if the user writes a line in this file then |
− | *'''node'''_'''jobid'''.out | + | **this line is executed on the primary node |
+ | **the output (stdout and stderr) of said command is appended to the '''node'''_'''jobid'''.out file<br> | ||
+ | **the '''node'''_'''job'''.cmd file is erased and re-created | ||
+ | *'''node'''_'''jobid'''.out | ||
+ | **the output of the command from the '''node'''_'''jobid'''.cmd file | ||
− | where '''node''' will be replaced by the host name of the primary node of the job | + | where '''node''' will be replaced by the host name of the primary node of the job |
and '''jobid''' will be replaced by the PBS job id of said job | and '''jobid''' will be replaced by the PBS job id of said job |
Version depuis le 22 de novembre 2011 à 17:39
en construction |
under construction |
A running job can be monitored/interrogated with a local utility called
u.job-monitor
There a 2 ways to activate this utility
- at job submission time:
ord_soumet .... -prolog jobmonitor .... - with an explicit command in the job itself
u.job-monitor &
caveat: in the case of an MPI job the only node that will be monitored is node 0 (primary node)
the job monitor uses 3 files found in directory $HOME/top_in_batch for each monitored job
- node_jobid.top
refreshed every 10 seconds with the output of a top command for processes belonging to the user - node_jobid.cmd
if the user writes a line in this file then- this line is executed on the primary node
- the output (stdout and stderr) of said command is appended to the node_jobid.out file
- the node_job.cmd file is erased and re-created
- node_jobid.out
- the output of the command from the node_jobid.cmd file
where node will be replaced by the host name of the primary node of the job
and jobid will be replaced by the PBS job id of said job