MAS malfunction diagnosis

From MCEWiki

Here's how to determine what is wrong with MAS. Ask yourself each of the following questions. "Yes" answers are good, "No" answers are bad.

Does the PC boot?

This can be tested by turning on the machine. If the machine boots relatively cleanly, then great. This isn't your problem.

If you get a kernel panic or some other serious error during boot, try each of the following:

  • power cycle the computer, leaving it powered down for at least 5 seconds. This is to reset the state of the PCI card in case it is angry with the PC.
  • boot using the Ubuntu default kernel, instead of the patched one. This is accomplished by pressing "Escape" at the beginning of the boot sequence to enter the boot-loader menu (Grub), then selecting "Ubuntu, kernel 2.6.15-26-server" from the menu.

If the power cycle seems to fix the problem, then the PCI card was angry with the PC. If this problem recurs, make sure that the driver is running in quiet mode, and start to be suspicious about your hardware.

If the bigphys kernel does not boot but the Ubuntu default kernel does, then there may be some problems with the boot options; see the section on "Boot Menu" in MAS OS setup .


Is the bigphys kernel loaded?

From a terminal, run the command

uname -r

You should receive the response

2.6.15.7-bigphys

If you do not, the bigphys kernel is not being loaded. See the section on "Boot Menu" in MAS OS setup.

Next, check the size of the bigphys allocation:

cat /proc/bigphysarea

This should return a message like this:

Big physical area, size 32768 kB
                       free list:             used list:
number of blocks:             1                      1
size of largest block:    23000 kB                9768 kB
total:                    23000 kB                9768 kB

If it doesn't, the "bigphysarea" boot option has probably not been configured. See "Boot Menu" in MAS OS setup. The "Big physical area, size" number doesn't need to be larger than about 10000 kB.

Is the MAS driver loaded?

The easiest way to check for the driver is

cat /proc/mce_dsp

If the file doesn't exist, the driver is not loaded. If the driver is loaded, a bunch of diagnostic messages will be produced. A working system will look something like this:

mce_dsp driver version gamow/mas:196
    fakemce:  no
    realtime: no
    bigphys:  yes
  data buffer:
    virtual:  0xc1833000
    bus:      0x01833000
    count:          4882
    head:           1000
    tail:           1000
    drops:             0
    size:          0x800
    data:          0x4d0
    mode:     quiet mode
  mce commander:
    state:    idle
  dsp commander:
    state:    idle
  dsp pci registers:
    hstr:     0x0003
    hctr:     0x0900

The most important things in this list are:

fakemce:  no
bigphys:  yes
mode:     quiet mode
hstr:     0x0003
hctr:     0x0900

If bigphys says "no", the driver has been compiled without bigphysarea support. Go re-compile and reinstall the driver.

If the product of "count" and "size" is much smaller than about 10M, the driver data buffer size is too small. You should be able to issue MCE commands cleanly, but multi-frame acquisitions will likely fail.

If "mode" is not "quiet mode", you may be using out-dated PCI card firmware. Inspect /var/log/messages for messages from the driver (they are marked "mceds"; you can grep for "dsp_query_version"). The "PCI card DSP code version" should be at least U0104.

Are the MAS device nodes present?

Run:

ls -l /dev/mce_*

You should see something like

crw-rw-r-- 1 root mce 252, 0 2008-09-03 00:05 /dev/mce_cmd0
crw-rw-r-- 1 root mce 251, 0 2008-09-03 00:05 /dev/mce_data0
crw-rw-r-- 1 root mce 253, 0 2008-09-03 00:05 /dev/mce_dsp0

If you do not, run

sudo mas_mknodes

and check again. If the mknodes script fails, it is probably because the driver is not loaded.

Is the MAS logging server running?

Can the system communicate with the PCI card?

Can the system communicate with the MCE?