Bastian Krüger
2014-07-25 08:33:34 UTC
I recently began working with a cluster that consists of 1 control node and
several computation node and it was set up a couple of years ago by someone
else. In this current setup, there is only one actual slurm installation,
which is located on the control node in /usr/local/slurm. All the other
nodes just mount that directory to their /usr/local/slurm. The only thing
that is copied between the nodes is the service startup script in
/etc/init.d.
The question is, if that is a good idea or not. I realize that if the
control node fails, that all the other nodes lose the mounted slurm
directory. But how crucial is that?
Also, I'm thinking about adding a backup control node. This node has to
share a directory with the first control node. Are there any advises on
where this directory should be located? Could it live on the backup control
node or would it be better to use a separate server?
several computation node and it was set up a couple of years ago by someone
else. In this current setup, there is only one actual slurm installation,
which is located on the control node in /usr/local/slurm. All the other
nodes just mount that directory to their /usr/local/slurm. The only thing
that is copied between the nodes is the service startup script in
/etc/init.d.
The question is, if that is a good idea or not. I realize that if the
control node fails, that all the other nodes lose the mounted slurm
directory. But how crucial is that?
Also, I'm thinking about adding a backup control node. This node has to
share a directory with the first control node. Are there any advises on
where this directory should be located? Could it live on the backup control
node or would it be better to use a separate server?