ad1

Tuesday, November 24, 2015

Runs and Jobs

So, here we go.  We're talking about 2 forms here.  The first, the Job Policy Setup form, is accessed by selecting Options->Job Setup from the ADE XL window.  The second, where most of the confusion seems to come in, is the Run Options form, accessed by selecting Options->Run Options.

The upper part of the Job Policy Setup form concerns how you want to distribute your simulation runs in ADE XL.  This will vary depending on what type of job distribution system your company uses (LSF, SGE, etc.).   We'll take that up another time.  For now, let's just assume you have some sort of system that lets you fire off simulations onto a farm of machines.

Now we need to define some terms:

A "run" in ADE XL = pressing the green Run button in the UI.  Each "run" could be a single simulation or it could be 100 corner simulations...
A "job" in ADE XL = a remote job that is started for a run.
The way ADE XL works is that each run starts a number of jobs (actually the process is called "ICRP") on whatever machines you tell it to use (local, SGE, LSF, etc).  It uses the "Max. Jobs" field on the Job Policy Setup form to determine the maximum number of those jobs to start.

Those processes stay alive for some "linger time" so that ADE XL can more efficiently start new simulations as needed.  So it will start by sending out a simulation to each job, and then as each simulation completes, it will send that job a new simulation until all simulations in the run are completed.

Remember that.  A "job" is assigned to a  particular "run" and will likely receive a series of simulations to perform (assuming the "run" involves more than a single simulation).

Let's skip to the Run Options form now.

The default in the Run Options form is Serial - i.e. if you hit the green Run button twice, the second run will not use any jobs until the first run completes.

If you set this to Parallel, if you hit the green Run button twice, the runs will start in parallel. They will either share resources equally (which means if Max Jobs = 10, each run will use 5 jobs). Or you can specify - something like 3 jobs per run.

This applies to all run modes--corners, sweeps, Monte Carlo, optimization, etc.

So - for any given run, all jobs run in parallel. However multiple runs in ADE XL are serial by default and can be made parallel.

Now, what about the fields at the bottom of the Job Policy Setup form--For Multiple Runs (reassign immediately or wait until currently running points complete)?

Let's say your setup is:

  • Run in parallel
  • Share resources equally
  • Max jobs = 10.

You start a run and the simulations are going to take 10 minutes. All ten jobs start running the simulations.

Now you start another run after making some changes to the setup.

So -- the question is -- do you want to immediately kill five of the current jobs and re-assign them to the second run or do you want to wait until the first five currently running simulations finish before re-assigning the jobs to the second run.

(Don't worry, the simulations that were killed will be assigned to re-run on one of the (now) five jobs assigned to the first run)

It's an efficiency issue. Do you want to waste the progress already made by the jobs on a simulation or do you want to make sure the subsequent run(s) start right away?

No comments:

Post a Comment