Multicard MAS

From MCEWiki
Revision as of 12:31, 15 May 2018 by Mandana (talk | contribs)
Note: PCI card firmware U0107 (the current release) and earlier versions contain bugs which prohibit using multiple PCI cards in one computer.

Most of MAS is agnostic about the number of fibre cards in the system. By default, MAS only supports one fibre card. Support for multiple cards ("Multicard MAS") can be turned on, however, when building MAS. This page outlines specific procedures and caveats when using Multicard MAS.

Care has been taken to make Multicard MAS backwards compatible with the old, single card system, to permit use of legacy scripts and applications (albeit, perhaps restricted to one of the fibre cards in the system).

Building MAS

For generic build instructions, see: MAS OS setup

To enable Multicard MAS, pass --enable-multicard to configure before building MAS:

 ./configure --enable-multicard[=N]

where N is the maximum number of cards you want MAS to support. If omitted, N defaults to 2. Specifiying N<=1, is the same as not specifying this option at all (i.e.: multicard support is turned off). This results in both a multicard capable driver and MAS library/applications.

Because the subracks attached to each fibre card may be different, instead of a single mce.cfg file, Multicard MAS requires one mce.cfg file for each fibre card supported, called /etc/mce/mce0.cfg, /etc/mce/mce1.cfg, /etc/mce/mce2.cfg, &c. As a result, instead of making a mce.cin template file, you must make mce0.cin, mce1.cin, mce2.cin, &c.

Running make and make install should proceed as usual. (See MAS OS setup.)

Card numbering

The kernel driver assigns sequential physical numbers to cards in the order in which they're passed in by the kernel at boot time. Because there is no guarantee that this procedure results in the same physical card enumeration each time, MAS abstracts physical card numbers to logical card numbers, which are fixed to a given physical card. Although the cards are indistinguishable themselves, MAS uses the PCI slot address to break the degeneracy.

After the kernel boots, and the kernel driver has assigned physical cards, udev runs mas_mknodes for each card it finds, passing this script the PCI slot address of the card. mas_mknodes then consults /proc/mce_dsp to determine the card's physical number and a the file /etc/mce/mce_card_id (if present) to determine it logical card number. It then makes nodes /dev/mce_cmd<l>, /dev/mce_cmd<l>, /dev/mce_cmd<l> pointing to the appropriate physical card, where l is the logical card number. (See the udev ruleset in scripts/91-mas.rules and "mas_mknodes --help" for more details.)

Generating /etc/mce/mce_card_id

The /etc/mce/mce_card_id file is a simple text file with two columns, and one row for each fibre card supported. For a given card, the first column contains its PCI slot address, and the second column it's logical card number. The file may also contain comment lines whose first character is a hash mark (#). A typical file might look like:

# PCI_SLOT_ID   LOGICAL_CARD_NUM
0000:02:0c.0        0
0000:02:0d.0        1

The file can be created by hand, but it is typically made by running the mas_make_card_id script. This script examines the system as it is currently configured and creates a /etc/mce/mce_card_id which will result in the same configuration on subsequent boots. This script obtains logical card numbers by searching for /dev/mce_cmd# devices, and physical card numbers from /proc/mce_dsp. It ignores /dev/mce_cmd# devices which do not point to a valid physical card. It will also ignore physical cards which do not have a corresponding /dev/mce_cmd# (ie. no logical card number assigned), unless the '-a' option is passed to the script, in which case, it will automatically assign logical card numbers to unenumerated physical cards.

As a result, a /etc/mce/mce_card_id file can be created for a brand new system with no cards configured by running:

mas_make_card_id -a

If /etc/mce/mce_card_id doesn't exist, mas_mknodes simply uses physical card numbers for logical card numbers. If the file does exist, but the specified PCI slot address isn't in the file, mas_mknodes will fail.

Using Multicard MAS

Note: The following assumes a basic familiarity with the use of MAS and MCE script with a single fibre card. See MAS and MCE script for further details.

When multiple fibre cards are present in a system, both MAS and MCE script need facilities to select and distinguish between them. Multicard MAS complicates the MAS/MCE script ecology by requiring paths previously specified by environmental variables, most notably $MAS_DATA, to change based on which fibre card is being used.

Summary

The following is a quick summary of how to use multiple fibre cards when doing stuff on the command line for people familiar with single-card MAS:

  • Don't explicitly set any environmental variables
  • Instead of the old mas_env.bash, add to your .bashrc:
 eval `/usr/mce/bin/mas_var -e -s`
(NB: those are back-ticks). This will insert all sorts of useful MAS_... variables into your environment.
  • Card #0 is selected by default.
  • To change to card N, execute:
 $ eval `mas_var -n N -e -s`
  • The two MCEs need different mce.cfg (called mce0.cfg, mce1.cfg, ...) MAS will make these for you.
  • The two MCEs need different configuration data. Configuration is distinguished via the array_id file.

The following sections go into more detail about how multicard MAS works and why it's done that way.

Explicit card selection with MAS applications

Most MAS application programs have a -n switch which allows specifying explicitly which logical card to operate on:

$ mce_cmd -n 0 -qx rb cc card_id
Line   0 : ok : 0xb5ccf7
$ mce_cmd -n 1 -qx rb cc card_id
Line   0 : ok : 0x20f373d

This is fine for simple operations, but can get tedious with repeated use, and doesn't work with many MCE scripts (which don't pass -n when spawning MAS applications like mce_cmd).

Card selection via the environment

If no explicit card is passed to a MAS application with -n, they will consult the environmental variable $MAS_MCE_DEV to determine the current logical card number:

$ export MAS_MCE_DEV=0
$ mce_cmd -qx rb cc card_id
Line   0 : ok : 0xb5ccf7
$ export MAS_MCE_DEV=1
$ mce_cmd -qx rb cc card_id
Line   0 : ok : 0x20f373d

This is better, and such card numbers specified in this way will even be honoured by MCE scripts. For backwards compatibility, if neither -n is specified nor $MAS_MCE_DEV is available, card zero is used as a default.

Card-dependent paths and mas_var

While $MAS_MCE_DEV solves the card selection problem, the problem of distinguishing the data output from the two cards still remains. Writing configuration and data from more than one card to the same data directory will confuse much of MCE script. The minimum solution requires changing at least $MAS_DATA and $MAS_DATA_ROOT when switching fibre cards to get MCE script to work:

$ export MAS_MCE_DEV=0
$ export MAS_DATA_ROOT=/data/mce0
$ export MAS_DATA=/data/mce0/current_data
$ mce_run test_data 100000 s
$ export MAS_MCE_DEV=1
$ export MAS_DATA_ROOT=/data/mce1
$ export MAS_DATA=/data/mce1/current_data
$ mce_run test_data 100000 s

That's a bit of a pain. So MCE script has been overhauled and now they never explicitly reference environmental variables. Instead a new MAS application has been written called mas_var, which the MCE scripts use to calculate paths for the current fibre card:

$ export MAS_MCE_DEV=0
$ mas_var --data-dir
/data/mce0/current_data
$ export MAS_MCE_DEV=1
$ mas_var --data-dir
/data/mce1/current_data

And now we're back to:

$ export MAS_MCE_DEV=0
$ mce_run test_data 100000 s
$ export MAS_MCE_DEV=1
$ mce_run test_data 100000 s

(where mce_run contains calls to mas_var) with data written to either /data/mce0/current_data/test_data or /data/mce1/current_data/test_data as appropriate. So standard operating procedure should now be to not define any MAS_* environmental variables except for $MAS_MCE_DEV.

mas_var calculates its paths based on information in mas.cfg which was, in turn, generated by information passed to MAS's ./configure script. Use of the mas_var program is explained on its own page, which you might want to read.

Usability problems without an environment

In addition to specifying paths to the MCE scripts, the $MAS_* environmental variables were also handy when working interactively with an MCE. If no paths are provided in the environment, it's no longer possible to do something convenient like:

$ cd $MAS_DATA

To get around this, mas_var, has a mode where it prints out bash (or C-shell) commands to set-up all the environment which can be piped back into the currently running shell using the shell built-in command eval and a pair of back-ticks (`):

$ echo $MAS_DATA

$ eval `mas_var -s`
$ echo $MAS_DATA
/data/mce0/current_data

(See the mas_var page for more information on -s and -c.)

Environmental overrides

Another feature this environment-less operation removes from the legacy operation of MCE script: if scripts always use mas_var to determine paths, and mas_var just generates them from the information given to it in mas.cfg, then it's no longer possible to override MAS paths, which we could do previously by just changing the appropriate $MAS_* variable.

In the past we could do:

$ mce_raw_acq 1
Acquiring raw data to /data/cryo/current_data/1350606349_raw
$ export MAS_DATA=/tmp
$ mce_raw_acq 1
Acquiring raw data to /tmp/1350606363_raw

In order to recover this behaviour, if asking mas_var for a path which has a corresponding environmental variable (like "mas_var --data-dir" is associated with $MAS_DATA), mas_var will just repeat the value of that environmental variable if it has been set:

$ mas_var --data-dir
/data/mce0/current_data
$ mce_raw_acq 1
Acquiring raw data to /data/mce0/current_data/1350606349_raw
$ export MAS_DATA=/tmp
$ mas_var --data-dir
/tmp
$ mce_raw_acq 1
Acquiring raw data to /tmp/1350606363_raw

so environmental overrides are once again possible.

Sticky overrides

Introducing environmental override capability to mas_var gives us another problem: it can have serious unexpected results on a multicard system since the overrides defeat the card-specific path solution that mas_var originally gave us:

$ export MAS_MCE_DEV=0
$ mas_var --data-dir
/data/mce0/current_data
$ export MAS_DATA=/data/mce0/current_data
$ mas_var --data-dir
/data/mce0/current_data
$ export MAS_MCE_DEV=1
$ mas_var --data-dir
/data/mce0/current_data

(ie. environmental variables are "sticky": they don't change when the card changes, which happens to be the very problem we were trying to solve with mas_var in the first place).

There's a -e switch to mas_var which forces it to ignore the current environment and reset everything to the defaults contained in mas.cfg. That solves the problem here:

$ export MAS_MCE_DEV=0
$ mas_var --data-dir
/data/mce0/current_data
$ export MAS_DATA=/data/mce0/current_data
$ mas_var --data-dir
/data/mce0/current_data
$ export MAS_MCE_DEV=1
$ eval `mas_var -e -s`
$ mas_var --data-dir
/data/mce1/current_data

Actually, since mas_var is a regular MAS application, it supports the -n switch and we can make card-selection and environment reset a one-liner:

$ eval `mas_var -n 0 -e -s`
$ echo $MAS_MCE_DEV
0
$ mas_var --data-dir
/data/mce0/current_data
$ export MAS_DATA=/data/mce0/current_data
$ mas_var --data-dir
/data/mce0/current_data
$ eval `mas_var -n 1 -e -s`
$ echo $MAS_MCE_DEV
1
$ mas_var --data-dir
/data/mce1/current_data

which makes things pretty useable.

For interactive use, that's probably good enough: we can put

eval `mas_var -e -s`

in our .bashrc file to intialise the MAS environment the first time when we log in and then just run:

$ eval `mas_var -n N -e -s`

whenever we want to switch to card N.

Streamlining: the "minimal environment"

If we don't care about being able to do things like "cd $MAS_DATA" we can simplify the environment even more to the point that running mas_var is no longer needed to switch cards. This is especially handy when running MAS for multiple cards semi- or non-interactively. The minimum environment needed to run MCE scripts on any particular card is:

  • $MAS_MCE_DEV to choose the card
  • $MAS_VAR to point the MCE scripts to mas_var
  • modified $PATH and $PYTHONPATH to allow the shell and Python to find the MAS and MCE script elements needed to run the system.

This minimal environment will be provided if running "mas_var -s" (or "mas_var -c" in the C Shell) including the -x option (also add -e if you want to remove any previous environmental overrides):

$ eval `mas_var -x -s`

This minimal environment allows very simple card switching at the expense of losing the convenience of having the $MAS_... paths in the environment:

$ eval `mas_var -x -s`
$ MAS_MCE_DEV=0 mce_raw_acq 1
Acquiring raw data to /data/mce0/current_data/1350606349_raw
$ MAS_MCE_DEV=1 mce_raw_acq 1
Acquiring raw data to /data/mce1/current_data/1350606363_raw

Similarly, automated higher level applications which control multiple fibre cards in the same program can switch cards, if launched from the minimal environment, by then just setenv(3)ing a single variable ($MAS_MCE_DEV), which has very appealing robustness benefits.

Writing Multicard-MAS-enabled scripts

Here are some pointers for writing MCE scripts for use in multicard environments:

Bash

  • never hard-code MAS or MCE paths.
  • add to the top of your script:
if [ ! -x ${MAS_VAR:=/usr/mce/bin/mas_var} ]; then
  echo "Cannot find mas_var.  Set MAS_VAR to the full path to the mas_var binary." >&2
  exit 1
else
  eval $(${MAS_VAR} -s)
fi
The above tries to find mas_var (using $MAS_VAR if set) and then uses it to set all the "regular" environmental variables.
  • after executing the above, your script can use the regular environmental variables: $MAS_DATA, $MAS_BIN, etc.

Python

  • never hard-code MAS or MCE paths.
  • don't use environmental variables
  • a mas_path module has been created to deal with paths; it's part of the auto_setup package. At the top of your script, add:
from auto_setup.util import mas_path
mas_path = mas_path()
  • Use mas_path to fetch directories:
data_dir = mas_path.data_dir()
  • Best practice is to use os.path.join to concatenate path elements like these:
data_file = os.path.join(mas_path.data_dir(), "my_data_file")