The following configuration file defines system-wide default values for several clush tool parameters:
clush settings might then be overridden per user if one of the following files is found, in priority order:
$XDG_CONFIG_HOME/clustershell/clush.conf $HOME/.config/clustershell/clush.conf (only if $XDG_CONFIG_HOME is not defined) $HOME/.local/etc/clustershell/clush.conf $HOME/.clush.conf (deprecated, for 1.6 compatibility only)
The following table describes available clush config file settings.
|fanout||Size of the sliding window of ssh(1) connectors.|
|connect_timeout||Timeout in seconds to allow a connection to establish. This parameter is passed to ssh(1). If set to 0, no timeout occurs.|
|command_timeout||Timeout in seconds to allow a command to complete since the connection has been established. This parameter is passed to ssh(1). In addition, the ClusterShell library ensures that any commands complete in less than (connect_timeout + command_timeout). If set to 0, no timeout occurs.|
|color||Whether to use ANSI colors to surround node
or nodeset prefix/header with escape sequences to
display them in color on the terminal. Valid
arguments are never, always or auto (which
use color if standard output/error refer to a
Colors are set to
|fd_max||Maximum number of open file descriptors permitted per clush process (soft resource limit for open files). This limit can never exceed the system (hard) limit. The fd_max (soft) and system (hard) limits should be high enough to run clush, although their values depend on your fanout value.|
|history_size||Set the maximum number of history entries saved in the GNU readline history list. Negative values imply unlimited history file size.|
|node_count||Should clush display additional (node count) information in buffer header? (yes/no)|
|verbosity||Set the verbosity level: 0 (quiet), 1 (default), 2 (verbose) or more (debug).|
|ssh_user||Set the ssh(1) user to use for remote connection (default is to not specify).|
|ssh_path||Set the ssh(1) binary path to use for remote connection (default is ssh).|
|ssh_options||Set additional (raw) options to pass to the underlying ssh(1) command.|
|scp_path||Set the scp(1) binary path to use for remote copy (default is scp).|
|scp_options||Set additional options to pass to the underlying scp(1) command. If not specified, ssh_options are used instead.|
|rsh_path||Set the rsh(1) binary path to use for remote connection (default is rsh). You could easily use mrsh or krsh by simply changing this value.|
|rcp_path||Same as rsh_path but for rcp command (default is rcp).|
|rsh_options||Set additional options to pass to the underlying rsh/rcp command.|
ClusterShell defines a node group syntax to represent a collection of nodes. This is a convenient way to manipulate node sets, especially in HPC (High Performance Computing) or with large server farms. This section explains how to configure node group sources. Please see also nodeset node groups for specific usage examples.
ClusterShell loads groups.conf configuration files that define how to obtain node groups configuration, ie. the way the library should access file-based or external node group sources.
The following configuration file defines system-wide default values for groups.conf:
groups.conf settings might then be overridden per user if one of the following files is found, in priority order:
$XDG_CONFIG_HOME/clustershell/groups.conf $HOME/.config/clustershell/groups.conf (only if $XDG_CONFIG_HOME is not defined) $HOME/.local/etc/clustershell/groups.conf
This makes possible for an user to have its own node groups configuration. If no readable configuration file is found, group support will be disabled but other node set operations will still work.
groups.conf defines configuration sub-directories, but may also define source definitions by itself. These sources provide external calls that are detailed in External group sources.
The following example shows the content of a groups.conf file where node groups are bound to the source named genders by default:
[Main] default: genders confdir: /etc/clustershell/groups.conf.d $CFGDIR/groups.conf.d autodir: /etc/clustershell/groups.d $CFGDIR/groups.d [genders] map: nodeattr -n $GROUP all: nodeattr -n ALL list: nodeattr -l [slurm] map: sinfo -h -o "%N" -p $GROUP all: sinfo -h -o "%N" list: sinfo -h -o "%P" reverse: sinfo -h -N -o "%P" -n $NODE
The groups.conf files are parsed with Python's ConfigParser:
- The first section whose name is Main accepts the following keywords:
- default defines a default node group source (eg. by referencing a valid section header)
- confdir defines an optional list of directory paths where the ClusterShell library should look for .conf files which define group sources to use. Each file in these directories with the .conf suffix should contain one or more node group source sections as documented below. These will be merged with the group sources defined in the main groups.conf to form the complete set of group sources to use. Duplicate group source sections are not allowed in those files. Configuration files that are not readable by the current user are ignored (except the one that defines the default group source). The variable $CFGDIR is replaced by the path of the highest priority configuration directory found (where groups.conf resides). The default confdir value enables both system-wide and any installed user configuration (thanks to $CFGDIR). Duplicate directory paths are ignored.
- autodir defines an optional list of directories where the ClusterShell library should look for .yaml files that define in-file group dictionaries. No need to call external commands for these files, they are parsed by the ClusterShell library itself. Multiple group source definitions in the same file is supported. The variable $CFGDIR is replaced by the path of the highest priority configuration directory found (where groups.conf resides). The default confdir value enables both system-wide and any installed user configuration (thanks to $CFGDIR). Duplicate directory paths are ignored.
- Each following section (genders, slurm) defines a group source. The map, all, list and reverse upcalls are explained below in Group source upcalls.
File-based group sources¶
Version 1.7 introduces support for native handling of flat files with different group sources to avoid the use of external upcalls for such static configuration. This can be achieved through the autodir feature and YAML files described below.
YAML group files¶
Cluster node groups can be defined in straightforward YAML files. In such a file, each YAML dictionary defines group to nodes mapping. Different dictionaries are handled as different group sources.
For compatibility reasons with previous versions of ClusterShell, this is not the default way to define node groups yet. So here are the steps needed to try this out:
Rename the following file:
to a file having the .yaml extension, for example:
Ensure that autodir is set in groups.conf:
autodir: /etc/clustershell/groups.d $CFGDIR/groups.d
In the following example, we also changed the default group source to roles in groups.conf (the first dictionary defined in the example), so that @roles:groupname can just be shorted @groupname.
Here is an example of /etc/clustershell/groups.d/cluster.yaml:
roles: adm: 'mgmt[1-2]' # define groups @roles:adm and @adm login: 'login[1-2]' compute: 'node[0001-0288]' gpu: 'node[0001-0008]' cpu_only: '@compute!@gpu' # example of inline set operation # define group @cpu_only with node[0009-0288] storage: '@lustre:mds,@lustre:oss' # example of external source reference all: '@login,@compute,@storage' # special group used for clush/nodeset -a # only needed if not including all groups lustre: mds: 'mds[1-4]' oss: 'oss[0-15]' rbh: 'rbh[1-2]'
Testing the syntax of your group file can be quickly performed through the
--list-all command of nodeset:
$ nodeset -LL @adm mgmt[1-2] @all login[1-2],mds[1-4],node[0001-0288],oss[0-15],rbh[1-2] @compute node[0001-0288] @cpu_only node[0009-0288] @gpu node[0001-0008] @login login[1-2] @storage mds[1-4],oss[0-15],rbh[1-2] @sysgrp sysgrp[1-4] @lustre:mds mds[1-4] @lustre:oss oss[0-15] @lustre:rbh rbh[1-2]
External group sources¶
Group source upcalls¶
Each node group source is defined by a section name (source name) and up to four upcalls:
- map: External shell command used to resolve a group name into a node set, list of nodes or list of node sets (separated by space characters or by carriage returns). The variable $GROUP is replaced before executing the command.
- all: Optional external shell command that should return a node set, list
of nodes or list of node sets of all nodes for this group source. If not
specified, the library will try to resolve all nodes by using the list
external command in the same group source followed by map for each
available group. The notion of all nodes is used by
clush -aand also by the special group name
- list: Optional external shell command that should return the list of all
groups for this group source (separated by space characters or by carriage
returns). If this upcall is not specified, ClusterShell won't be able to
list any available groups (eg. with
nodeset -l), so it is highly recommended to set it.
- reverse: Optional external shell command used to find the group(s) of a single node. The variable $NODE is previously replaced. If this external call is not specified, the reverse operation is computed in memory by the library from the list and map external calls, if available. Also, if the number of nodes to reverse is greater than the number of available groups, the reverse external command is avoided automatically to reduce resolution time.
In addition to context-dependent $GROUP and $NODE variables described above, the two following variables are always available and also replaced before executing shell commands:
- $CFGDIR is replaced by groups.conf base directory path
- $SOURCE is replaced by current source name (see an usage example just below)
External command results are cached in memory, for a limited amount of time, to avoid multiple similar calls.
The optional parameter cache_time, when specified within a group source section, defines the number of seconds each upcall result is kept in cache, in memory only. Please note that caching is actually only useful for long-running programs (like daemons) that are using node groups, not for one-shot commands like clush or cluset/nodeset.
The default value of cache_time is 3600 seconds.
Multiple sources section¶
Use a comma-separated list of source names in the section header if you want to define multiple group sources with similar upcall commands. The special variable $SOURCE is always replaced by the source name before command execution (here cluster, racks and cpu), for example:
[cluster,racks,cpu] map: get_nodes_from_source.sh $SOURCE $GROUP all: get_all_nodes_from_source.sh $SOURCE list: list_nodes_from_source.sh $SOURCE
is equivalent to:
[cluster] map: get_nodes_from_source.sh cluster $GROUP all: get_all_nodes_from_source.sh cluster list: list_nodes_from_source.sh cluster [racks] map: get_nodes_from_source.sh racks $GROUP all: get_all_nodes_from_source.sh racks list: list_nodes_from_source.sh racks [cpu] map: get_nodes_from_source.sh cpu $GROUP all: get_all_nodes_from_source.sh cpu list: list_nodes_from_source.sh cpu
Return code of external calls¶
Each external command might return a non-zero return code when the operation is not doable. But if the call return zero, for instance, for a non-existing group, the user will not receive any error when trying to resolve such unknown group. The desired behavior is up to the system administrator.
Slurm group bindings¶
Enable Slurm node group bindings by renaming the example configuration file
usually installed as
slurm.conf. Three group sources are defined in this file and are detailed
below. Each section comes with a long and short names (for convenience), but
actually defines a same group source.
The first section slurmpart,sp defines a group source based on Slurm partitions. Each group is named after the partition name and contains the partition's nodes:
[slurmpart,sp] map: sinfo -h -o "%N" -p $GROUP all: sinfo -h -o "%N" list: sinfo -h -o "%R" reverse: sinfo -h -N -o "%R" -n $NODE
Example of use with nodeset on a cluster having two Slurm partitions named kepler and pascal:
$ nodeset -s sp -ll @sp:kepler cluster-[0001-0065] @sp:pascal cluster-[0066-0068]
The second section slurmstate,st defines a group source based on Slurm node states. Each group is based on a different state name and contains the nodes currently in that state:
[slurmstate,st] map: sinfo -h -o "%N" -t $GROUP all: sinfo -h -o "%N" list: sinfo -h -o "%T" | tr -d '*~#$@+' reverse: sinfo -h -N -o "%T" -n $NODE | tr -d '*~#$@+' cache_time: 60
Here, cache_time is set to 60 seconds instead of the default (3600s) to avoid caching results in memory for too long, in case of state change (this is only useful for long-running processes, not one-shot commands).
Example of use with nodeset to get the current nodes that are in the Slurm state drained:
$ nodeset -f @st:drained cluster-[0058,0067]
The third section slurmjob,sj defines a group source based on Slurm jobs. Each group is based on a running job ID and contains the nodes currently allocated for this job:
[slurmjob,sj] map: squeue -h -j $GROUP -o "%N" list: squeue -h -o "%i" -t R reverse: squeue -h -w $NODE -o "%i" cache_time: 60
The fourth section slurmuser,su defines a group source based on Slurm users. Each group is based on a username and contains the nodes currently allocated for jobs belonging to the username:
[slurmuser,su] map: squeue -h -u $GROUP -o "%N" -t R list: squeue -h -o "%u" -t R reverse: squeue -h -w $NODE -o "%i" cache_time: 60
Example of use with clush to execute a command on all nodes with running jobs of username:
$ clush -bw@su:username 'df -Ph /scratch' $ clush -bw@su:username 'du -s /scratch/username'
cache_time is also set to 60 seconds instead of the default (3600s) to avoid caching results in memory for too long, because this group source is likely very dynamic (this is only useful for long-running processes, not one-shot commands).
You can then easily find nodes associated with a Slurm job ID:
$ nodeset -f @sj:686518 cluster-[0003,0005,0010,0012,0015,0017,0021,0055]
xCAT group bindings¶
Enable xCAT node group bindings by renaming the example configuration file
usually installed as
xcat.conf. A single group source is defined in this file and is detailed
The section xcat defines a group source based on xCAT static node groups:
[xcat] # list the nodes in the specified node group map: lsdef -s -t node $GROUP | cut -d' ' -f1 # list all the nodes defined in the xCAT tables all: lsdef -s -t node | cut -d' ' -f1 # list all groups list: lsdef -t group | cut -d' ' -f1
Example of use with cluset:
$ lsdef -s -t node dtn sh-dtn01 (node) sh-dtn02 (node) $ cluset -s xcat -f @dtn sh-dtn[01-02]
Modifying library defaults is for advanced users only as that could change the behavior of tools using ClusterShell. Moreover, tools are free to enforce their own defaults, so changing library defaults may not change a global behavior as expected.
Since version 1.7, most defaults of the ClusterShell library may be overridden in defaults.conf.
The following configuration file defines ClusterShell system-wide defaults:
defaults.conf settings might then be overridden per user if one of the following files is found, in priority order:
$XDG_CONFIG_HOME/clustershell/defaults.conf $HOME/.config/clustershell/defaults.conf (only if $XDG_CONFIG_HOME is not defined) $HOME/.local/etc/clustershell/defaults.conf
Use case: rsh¶
If your cluster uses a rsh variant like
krsh, you may want to
change it in the library defaults.
An example file is usually available in
/usr/share/doc/clustershell-*/examples/defaults.conf-rsh and could be
/etc/clustershell/defaults.conf or to an alternate path
described above. Basically, the change consists in defining an alternate
distant worker by Python module name as follow:
[task.default] distant_workername: Rsh
Use case: Slurm¶
If your cluster naming scheme has multiple dimensions, as in
recommend that you disengage some nD folding when using Slurm, which is
currently unable to detect some multidimensional node indexes when not
explicitly enclosed with square brackets.
To do so, define
fold_axis to -1 in the Library Defaults so that nD
folding is only computed on the last axis (seems to work best with Slurm):
[nodeset] fold_axis: -1
That way, node sets computed by ClusterShell tools can be passed to Slurm without error.