Discussion:
Error with slurmctld
Monica Marathe
2014-10-01 19:54:37 UTC
Permalink
Hey,

It's my first time using SLURM and I'm getting the following error when I
run slurmctld:

[***@localhost ~]# slurmctld -D -vvvvvv
slurmctld: debug2: No last_config_lite file (/tmp/last_config_lite) to
recover
slurmctld: debug4: unable to create link for /tmp/last_config_lite ->
/tmp/last_config_lite.old: No such file or directory
slurmctld: error: Configured MailProg is invalid
slurmctld: slurmctld version 14.03.7 started on cluster cluster
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/crypto_munge.so
slurmctld: Munge cryptographic signature plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/select_linear.so
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/preempt_none.so
slurmctld: preempt/none loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/checkpoint_none.so
slurmctld: debug3: Success.
slurmctld: Checkpoint plugin loaded: checkpoint/none
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/acct_gather_energy_none.so
slurmctld: AcctGatherEnergy NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/acct_gather_profile_none.so
slurmctld: AcctGatherProfile NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/acct_gather_infiniband_none.so
slurmctld: AcctGatherInfiniband NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/acct_gather_filesystem_none.so
slurmctld: AcctGatherFilesystem NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug2: No acct_gather.conf file
(/usr/local/etc/acct_gather.conf)
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/jobacct_gather_none.so
slurmctld: Job accounting gather NOT_INVOKED plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/ext_sensors_none.so
slurmctld: ExtSensors NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/switch_none.so
slurmctld: switch NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: error: this host (localhost) not valid controller
(localhost.localdomain or (null))

How can I fix these issues?

Regards,
Monica
--
- Monica Marathe
Uwe Sauter
2014-10-01 20:13:34 UTC
Permalink
Hi Monica,
Post by Monica Marathe
Hey,
It's my first time using SLURM and I'm getting the following error when
slurmctld: debug2: No last_config_lite file (/tmp/last_config_lite) to
recover
slurmctld: debug4: unable to create link for /tmp/last_config_lite ->
/tmp/last_config_lite.old: No such file or directory
I don't know what that means but this message is probably the least to
be concerned about. I think it will disappear once SLURM was started
successfully for the first time.
Post by Monica Marathe
slurmctld: error: Configured MailProg is invalid
Make sure that either /bin/mail exists or configure the parameter
MailProg to the mail program of your choice.
Post by Monica Marathe
slurmctld: slurmctld version 14.03.7 started on cluster cluster
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/crypto_munge.so
slurmctld: Munge cryptographic signature plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/select_linear.so
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/preempt_none.so
slurmctld: preempt/none loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/checkpoint_none.so
slurmctld: debug3: Success.
slurmctld: Checkpoint plugin loaded: checkpoint/none
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/acct_gather_energy_none.so
slurmctld: AcctGatherEnergy NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/acct_gather_profile_none.so
slurmctld: AcctGatherProfile NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/acct_gather_infiniband_none.so
slurmctld: AcctGatherInfiniband NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/acct_gather_filesystem_none.so
slurmctld: AcctGatherFilesystem NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug2: No acct_gather.conf file
(/usr/local/etc/acct_gather.conf)
If you want to use one of the acct_gather plugins, you have to provide a
configuration file for it. See
http://slurm.schedmd.com/acct_gather.conf.html
Post by Monica Marathe
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/jobacct_gather_none.so
slurmctld: Job accounting gather NOT_INVOKED plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/usr/local/lib/slurm/ext_sensors_none.so
slurmctld: ExtSensors NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/switch_none.so
slurmctld: switch NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: error: this host (localhost) not valid controller
(localhost.localdomain or (null))
No not use "localhost" for the parameters BackupController, BackupAddr,
ControlAddr and ControlMachine but the real machine's name. And make
sure you can lookup that name (either DNS or entry in /etc/hosts).

Also, have a look at http://slurm.schedmd.com/slurm.conf.html for
parameters used in slurm.conf.

If you would like to start over with a new configuration file you can
use the simple or full version of the config file generator:

http://slurm.schedmd.com/configurator.easy.html
http://slurm.schedmd.com/configurator.html

Regards,

Uwe
Post by Monica Marathe
How can I fix these issues?
Regards,
Monica
--
- Monica Marathe
Loading...