Discussion:
How to size the controller systems
Louis Capps
2014-08-18 18:08:34 UTC
Permalink
Hi,
We are looking at using SLURM for a large 6000 node cluster and need more
info on the support systems. Can you point me to a sizing guide or info on
the requirements for the primary and backup controllers for SLURM including
CPU, memory and local disk requirements?

Thx,
Louis


*******************************************************************************************

Louis Capps (lcapps-r/Jw6+rmf7HQT0dZR+***@public.gmane.org)
--- Systems Architect - Federal High Performance Computing - US Federal
IMT - IBM Corporation
--- Office (512)286-5556, t/l 363-5556 --- fax 678-6146 --- cell
(512)796-4501
--- Bld 045, 3C80, Austin, TX
http://www-1.ibm.com/servers/deepcomputing/
http://www-03.ibm.com/systems/clusters/
*******************************************************************************************
j***@public.gmane.org
2014-08-18 18:18:38 UTC
Permalink
You could probalby use an old netbook, but more memory and cores are
better. Faster CPUs will provide better responsivness.
Post by Louis Capps
Hi,
We are looking at using SLURM for a large 6000 node cluster and need more
info on the support systems. Can you point me to a sizing guide or info on
the requirements for the primary and backup controllers for SLURM including
CPU, memory and local disk requirements?
Thx,
Louis
*******************************************************************************************
--- Systems Architect - Federal High Performance Computing - US Federal
IMT - IBM Corporation
--- Office (512)286-5556, t/l 363-5556 --- fax 678-6146 --- cell
(512)796-4501
--- Bld 045, 3C80, Austin, TX
http://www-1.ibm.com/servers/deepcomputing/
http://www-03.ibm.com/systems/clusters/
*******************************************************************************************
--
Morris "Moe" Jette
CTO, SchedMD LLC

Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html
Uwe Sauter
2014-08-18 18:18:44 UTC
Permalink
Hi Louis,

depending on the usage scenario of your cluster you will have different
requirements.

You can find general information about SLURM configuration on the
SchedMD website: http://slurm.schedmd.com/

There you will also find more specific subpages regarding

* cluster configuration for high throughout
http://slurm.schedmd.com/high_throughput.html
* high availability configuration
http://slurm.schedmd.com/quickstart_admin.html#HA
* BlueGene systems
http://slurm.schedmd.com/bluegene.html

and many more.

Regards,

Uwe
Post by Louis Capps
Hi,
We are looking at using SLURM for a large 6000 node cluster and need
more info on the support systems. Can you point me to a sizing guide or
info on the requirements for the primary and backup controllers for
SLURM including CPU, memory and local disk requirements?
Thx,
Louis
*******************************************************************************************
--- Systems Architect - Federal High Performance Computing - US
Federal IMT - IBM Corporation
--- Office (512)286-5556, t/l 363-5556 --- fax 678-6146 --- cell
(512)796-4501
--- Bld 045, 3C80, Austin, TX
http://www-1.ibm.com/servers/deepcomputing/
http://www-03.ibm.com/systems/clusters/
*******************************************************************************************
Jason Bacon
2014-08-18 18:33:35 UTC
Permalink
The controller generally shouldn't require much, but if you're running
Linux, be aware that the way memory use is measured in recent kernels
makes it look like slurmctld is using a lot of RAM when multiple threads
are active. I had to up the per-process limit to 10G on our CentOS 6.5
controller nodes, even though slurmctld was using less than 1G in reality.

Regards,

Jason
Post by Louis Capps
Hi,
We are looking at using SLURM for a large 6000 node cluster and need
more info on the support systems. Can you point me to a sizing guide
or info on the requirements for the primary and backup controllers for
SLURM including CPU, memory and local disk requirements?
Thx,
Louis
*******************************************************************************************
--- Systems Architect - Federal High Performance Computing - US
Federal IMT - IBM Corporation
--- Office (512)286-5556, t/l 363-5556 --- fax 678-6146 --- cell
(512)796-4501
--- Bld 045, 3C80, Austin, TX
http://www-1.ibm.com/servers/deepcomputing/
http://www-03.ibm.com/systems/clusters/
*******************************************************************************************
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason W. Bacon
jwbacon-***@public.gmane.org

Circumstances don't make a man:
They reveal him.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Marcin Stolarek
2014-08-18 19:58:38 UTC
Permalink
Post by Jason Bacon
The controller generally shouldn't require much, but if you're running
Linux, be aware that the way memory use is measured in recent kernels makes
it look like slurmctld is using a lot of RAM
Can you point me to detailed information about that ? How is the memory
measured?
Post by Jason Bacon
when multiple threads are active. I had to up the per-process limit to
10G on our CentOS 6.5 controller nodes, even though slurmctld was using
less than 1G in reality.
Regards,
Jason
Hi,
We are looking at using SLURM for a large 6000 node cluster and need more
info on the support systems. Can you point me to a sizing guide or info on
the requirements for the primary and backup controllers for SLURM including
CPU, memory and local disk requirements?
Thx,
Louis
*******************************************************************************************
--- Systems Architect - Federal High Performance Computing - US Federal
IMT - IBM Corporation
--- Office (512)286-5556, t/l 363-5556 --- fax 678-6146 --- cell
(512)796-4501
--- Bld 045, 3C80, Austin, TX
http://www-1.ibm.com/servers/deepcomputing/
http://www-03.ibm.com/systems/clusters/
*******************************************************************************************
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason W. Bacon
They reveal him.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason Bacon
2014-08-18 23:31:35 UTC
Permalink
Unfortunately, I don't recall the details. I did find an article on the
web, but this was back around February.

In a nutshell, our slurmctld was mysteriously crashing on CentOS 6.5. I
think someone on this list pointed me to the Linux kernel issue, so it
might be in the archives.

After I increased the memory limit from 2G to 10G, the problem ceased.

I now have the following in /etc/security/limits.d/91-as.conf on our
controller nodes:

* soft as 16777216
* hard as 16777216
* soft memlock 16777216
* hard memlock 16777216

Slurmctld has been rock solid since this change. This cluster has 1136
cores, BTW.
Re: [slurm-dev] How to size the controller systems
The controller generally shouldn't require much, but if you're
running Linux, be aware that the way memory use is measured in
recent kernels makes it look like slurmctld is using a lot of RAM
Can you point me to detailed information about that ? How is the
memory measured?
when multiple threads are active. I had to up the per-process
limit to 10G on our CentOS 6.5 controller nodes, even though
slurmctld was using less than 1G in reality.
Regards,
Jason
Post by Louis Capps
Hi,
We are looking at using SLURM for a large 6000 node cluster and
need more info on the support systems. Can you point me to a
sizing guide or info on the requirements for the primary and
backup controllers for SLURM including CPU, memory and local disk
requirements?
Thx,
Louis
*******************************************************************************************
--- Systems Architect - Federal High Performance Computing - US
Federal IMT - IBM Corporation
--- Office (512)286-5556, t/l 363-5556 --- fax 678-6146 ---
cell (512)796-4501
--- Bld 045, 3C80, Austin, TX
http://www-1.ibm.com/servers/deepcomputing/
http://www-03.ibm.com/systems/clusters/
*******************************************************************************************
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason W. Bacon
They reveal him.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason W. Bacon
jwbacon-***@public.gmane.org

Circumstances don't make a man:
They reveal him.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Loading...