Running job monitor : Différence entre versions

Un article de Informaticiens département des sciences de la Terre et l'atmosphère
Aller à: navigation, charcher
m (Created page with "{| width="100%" cellspacing="2" cellpadding="10" style="border: 0px none;" |- | width="50%" valign="top" style="background: none repeat scroll 0% 0% rgb(238, 255, 255); border: 0...")
 
m
Ligne 9: Ligne 9:
 
|}
 
|}
  
A running job may me monitored using a local utility called <br>  
+
A running job can be monitored/interrogated with a local utility called <br>  
  
u.job-monitor
+
u.job-monitor  
  
 +
<br>
  
 +
There a 2 ways to activate this utility
  
There a 2 ways to activate this utility
+
*at job submission time:<br>ord_soumet ....&nbsp; -prolog jobmonitor ....  
 
 
*at job submit time:<br>ord_soumet ....&nbsp; -prolog jobmonitor ....
 
 
*with an explicit command in the job itself<br>u.job-monitor &amp;
 
*with an explicit command in the job itself<br>u.job-monitor &amp;
  
 +
<br>
  
 +
'''caveat''': in the case of an MPI&nbsp;job the only node that will be monitored is node 0 (primary node)
  
'''caveat''': in the case of an MPI&nbsp;job the only node that will be monitored is node 0 (primary node)
+
<br>
 
 
 
 
  
the job monitor uses 3 files found in directory $HOME/top_in_batch&nbsp; for each monitored job
+
the job monitor uses '''3''' files found in directory '''$HOME/top_in_batch'''&nbsp; for '''each''' monitored job  
  
*'''node'''_'''jobid'''.top
+
*'''node'''_'''jobid'''.top <br>refreshed every 10 seconds with the output of a top command for processes belonging to the user
*'''node'''_'''jobid'''.cmd
+
*'''node'''_'''jobid'''.cmd <br>if the user writes a line in this file then
*'''node'''_'''jobid'''.out
+
**this line is executed on the primary node
 +
**the output (stdout and stderr) of said command is appended to the '''node'''_'''jobid'''.out file<br>
 +
**the '''node'''_'''job'''.cmd file is erased and re-created
 +
*'''node'''_'''jobid'''.out
 +
**the output of the command from the '''node'''_'''jobid'''.cmd file
  
where '''node''' will be replaced by the host name of the primary node of the job
+
where '''node''' will be replaced by the host name of the primary node of the job  
  
 
and '''jobid''' will be replaced by the PBS job id of said job
 
and '''jobid''' will be replaced by the PBS job id of said job

Version depuis le 22 de novembre 2011 à 17:39

en construction

under construction

A running job can be monitored/interrogated with a local utility called

u.job-monitor


There a 2 ways to activate this utility

  • at job submission time:
    ord_soumet ....  -prolog jobmonitor ....
  • with an explicit command in the job itself
    u.job-monitor &


caveat: in the case of an MPI job the only node that will be monitored is node 0 (primary node)


the job monitor uses 3 files found in directory $HOME/top_in_batch  for each monitored job

  • node_jobid.top
    refreshed every 10 seconds with the output of a top command for processes belonging to the user
  • node_jobid.cmd
    if the user writes a line in this file then
    • this line is executed on the primary node
    • the output (stdout and stderr) of said command is appended to the node_jobid.out file
    • the node_job.cmd file is erased and re-created
  • node_jobid.out
    • the output of the command from the node_jobid.cmd file

where node will be replaced by the host name of the primary node of the job

and jobid will be replaced by the PBS job id of said job