Chunk lance : Différence entre versions
m |
m |
||
Ligne 17: | Ligne 17: | ||
Since the running job is not always called "chunk_job" one cannot see anymore from the job name how far the simulation has progressed. But one can always have a look at the listings directory and also a log file is kept in the config file directory called "chunk_job.log".<br>This file is essential for the whole chunk_job procedure. The chunk_job itself will check this file to determine which job to execute next. Therefore this log file must only be removed if one wants to restart a simulation from the beginning.<br>However, to rerun part of a simulation one can alter the log file by hand. Just make sure there is never a blank line at the end of the log file since the chunk_job only checks the very last line of the log file! | Since the running job is not always called "chunk_job" one cannot see anymore from the job name how far the simulation has progressed. But one can always have a look at the listings directory and also a log file is kept in the config file directory called "chunk_job.log".<br>This file is essential for the whole chunk_job procedure. The chunk_job itself will check this file to determine which job to execute next. Therefore this log file must only be removed if one wants to restart a simulation from the beginning.<br>However, to rerun part of a simulation one can alter the log file by hand. Just make sure there is never a blank line at the end of the log file since the chunk_job only checks the very last line of the log file! | ||
− | To start a simulation using Chunk_lance one only has to set the model environment (for example with '333') and execute "Chunk_lance" in the config file directory. | + | To start a simulation using Chunk_lance one only has to set the model environment (for example with '333') and execute "Chunk_lance" in the config file directory. |
− | The time up to which one chunk_job will be running can be set in the file 'configexp.dot.cfg' with the parameter 'BACKEND_time_mod'.<br>On guillimin one job is allowed to run up to 30 days (2592000 sec). <br>On colosse one job is allowed to run up to 2 days (172800 sec). | + | The time up to which one chunk_job will be running can be set in the file 'configexp.dot.cfg' with the parameter 'BACKEND_time_mod'.<br>On guillimin one job is allowed to run up to 30 days (2592000 sec). <br>On colosse one job is allowed to run up to 2 days (172800 sec). |
+ | <br> | ||
+ | === Restart using Chunk_lance === | ||
+ | In case a simulation stoppes one first has to find out which job (auto_launch, entry or model) was the last one that finished propperly.<br> | ||
+ | The best way is to look at the listing but the log file "chunk_job.log" can also be used for indications.<br> | ||
− | === | + | ==== Entry or model job crashed ==== |
− | In case | + | If the entry or the model job crashed it is enough to restart the simulation by executing "Chunk_lance" again in the config file directory.<br> |
+ | |||
+ | In this case the last line in the log file shold be: | ||
+ | |||
+ | ... entry ..._E starting at ...<br>or<br> ... model ..._M starting at ...<br> |
Version depuis le 20 de décembre 2011 à 15:28
Chunk_lance
Chunk_lance allows to run a sequence of monthly model jobs in one big job.
A GEM/GEMCLIM/CRCM5 simulation usually consists of a sequence of month jobs.
Each monthly job is made up out of 3 parts:
- auto_launch (copies restart files from previous month, prepare config files for new month)
- entry (only in LAM mode, prepares driving data)
- model (main model job)
Chunk_lance will run a series of these 3 jobs, checking at the end of each model job if there is still enough time to calculate another month. If yes the 3 jobs will get executed for another month, if not an new chunk_job will get submitted.
In case the model job fails it will automatically get reexecuted up to 4 times.
Since the running job is not always called "chunk_job" one cannot see anymore from the job name how far the simulation has progressed. But one can always have a look at the listings directory and also a log file is kept in the config file directory called "chunk_job.log".
This file is essential for the whole chunk_job procedure. The chunk_job itself will check this file to determine which job to execute next. Therefore this log file must only be removed if one wants to restart a simulation from the beginning.
However, to rerun part of a simulation one can alter the log file by hand. Just make sure there is never a blank line at the end of the log file since the chunk_job only checks the very last line of the log file!
To start a simulation using Chunk_lance one only has to set the model environment (for example with '333') and execute "Chunk_lance" in the config file directory.
The time up to which one chunk_job will be running can be set in the file 'configexp.dot.cfg' with the parameter 'BACKEND_time_mod'.
On guillimin one job is allowed to run up to 30 days (2592000 sec).
On colosse one job is allowed to run up to 2 days (172800 sec).
Restart using Chunk_lance
In case a simulation stoppes one first has to find out which job (auto_launch, entry or model) was the last one that finished propperly.
The best way is to look at the listing but the log file "chunk_job.log" can also be used for indications.
Entry or model job crashed
If the entry or the model job crashed it is enough to restart the simulation by executing "Chunk_lance" again in the config file directory.
In this case the last line in the log file shold be:
... entry ..._E starting at ...
or
... model ..._M starting at ...