PCI card hacking

From MCEWiki

This page describes using dsp_cmd and mce_cmd to debug MCE/PCI card communication issues. These instructions can be used to resolve questions like "is the data getting dropped by the PCI card, or does it never arrive at the PCI card?"

RAM areas on the DSP

The DSP has 3 RAM banks, known as X, Y, and P. User data is in X and Y, while the program code is in P. By default, all three banks contain something like 2k x 16bits. On our hardware, however, the Y bank has been extended with an external SRAM to 8M x 16bits.

In our firmware, X memory is used to store ordinary variables while Y memory is used for large buffers.

Reading and writing DSP RAM with dsp_cmd

We can use dsp_cmd to read back the values of variables and buffers in the PCI card RAM. This is done as follows:

mhasse@mce-ubc-2:~$ dsp_cmd -x read X 0
This is dsp_cmd version MAS/slotpc/327
Line   0 : ok : X[0] = 0x208

We can also write. Don't write to X or P memory unless you really know what you're doing.

mhasse@mce-ubc-2:~$ dsp_cmd -x write Y 0 0x1234
This is dsp_cmd version MAS/slotpc/327
Line   0 : ok

It is convenient to define a few functions to make this hacking easier (you can just dump this into your bash session, or put it in a file and run "source file" whenever you want the functions to be available):

function pciread {
    dsp_cmd -qpx read $1 $2 | cut -d' ' -f5
}
function pciwrite {
    dsp_cmd -qpx write $1 $2 $3
}
function pcidump {
    # usage: pcidump <BANK> <START> <COUNT>
    for i in `seq -f '%9.0f' $2 $(( $2 + $3 - 1 ))`; do printf "%s %#8x 0x%06x\n" $1 $i `pciread $1 $i`; done
}
function pciwash {
    # usage: pciwash <BANK> <START> <COUNT> <VALUE>
    for i in `seq -f '%9.0f' $2 $(( $2 + $3 - 1 ))`; do pciwrite $1 $i $4; done
}

Then you use them like this:

mhasse@mce-ubc-2:~$ pciread x 0
0x208
mhasse@mce-ubc-2:~ pciwrite y 0 0x1234

mhasse@mce-ubc-2:~$ pcidump Y 0x100000 4
Y 0x100000 0x004f4b
Y 0x100001 0x00474f
Y 0x100002 0x000016
Y 0x100003 0x000003

MCE command, reply and data buffers

We can inspect the DSP's buffers if we know their addresses. This depends on DSP code firmware revision. The addresses below refer to Y memory.

Version Data Reply Command
U0104 and earlier 0 0 0x200
U0105, U0106 0 0x100000 0x200000

Notes about data buffers:

  • because DSP words are 24 bits wide, and MCE words are 32 bits wide, 32-bit data is broken into pairs of 16-bit words, with the lower order bits coming first. So the 32-bit value 0x12345678 is stored as [0x005678, 0x001234].
  • in reply and data packets, the preamble, packet type, and frame size are stripped off. These will not appear in the buffers in Y memory. They are available at special locations in X memory, however.

(Incidentally, the collision between the "data" and reply/command buffers in version U0104 and earlier is why U0105 is required to be able to issue STOP or other MCE commands during frame acquisition.)

DSP internal variables

There are a few DSP internal variables, stored in X memory, that are of use in comm. hacking:

(For U0106, use same addresses as U0105.)

Name Address(U0104) Address(U0105) Description
STATUS 0x00 0x00 Status (and modes for <= U1014) of DSP; see dsp_status for an explanation
MODE n/a 0x01 Flags configuring certain modes of the DSP; see dsp_status for an explanation
HEAD_W3_0 0x21 0x1F Type of last reply or data packet, 'DA' or 'RP'
PACKET_SIZE_LOW 0x23 n/a Size of latest reply or data packet, in 16 bit words
PACKET_SIZE n/a 0x23 Size of latest reply or data packet, in 32 bit words
QT_FRAME_SIZE 0x53 0x40 The expected size of data packets in bytes
FRAME_COUNT 0x01 0x02 Number of DA packets received from the MCE (including discards)
QT_DROPS 0x5A 0x47 Number of data packets dropped due to full PC RAM buffer
RP_DROPS n/a 0x4B Number of reply packets dropped due to PC RAM buffer unavailable (quiet_RP mode only)

Currently, the number of packets that are dropped due to a packet size mismatch between PACKET_SIZE and QT_FRAME_SIZE is not recorded explicitly. However, such packets will cause FRAME_COUNT to increment.

All of the above variables, except STATUS, MODE, and FRAME_COUNT are cleared by a DSP_RESET. You can zero the frame count manually using pciwrite.

(Incidentally, the addresses in the table above are determined from the .lod symbol table of the assembler output.)

Examples

Firmware version U0105 is assumed below. Use the functions defined above.

Inspect a command packet

Note that in current firmware, packets are copied into the DSP buffer *at the same time* as they are transmitted to the MCE. So if your data appears in the DSP buffer, it has also been sent out on the FO transmitter.

We issue

mhasse@mce-ubc-2:~$ mce_cmd -x "wb cc led 7"

and then inspect the packet in the DSP buffer:

mhasse@mce-ubc-2:~$ pcidump y 0x200000 128
y 0x200000 0x00a5a5
y 0x200001 0x00a5a5
y 0x200002 0x005a5a
y 0x200003 0x005a5a
y 0x200004 0x002020
y 0x200005 0x005742
y 0x200006 0x000002
y 0x200007 0x000099
y 0x200008 0x000000
y 0x200009 0x000001
y 0x20000a 0x000000
y 0x20000b 0x000007
y 0x20000c 0x000000
y 0x20000d 0x000000
...
y 0x20007c 0x000000
y 0x20007d 0x000000
y 0x20007e 0x002022
y 0x20007f 0x0057dd


Is the MCE returning data?

Suppose "mce_run" is blocking without data and we want to check the frame data. We first fill the data buffer with a signature we can recognize:

mhasse@mce-ubc-2:~$ pciwash y 0x0 4096 0x123456

and we may as well zero the frame counter, too.

mhasse@mce-ubc-2:~$ pciwrite x 0x2 0


Then we trigger a go somehow. mce_run is easiest.

mhasse@mce-ubc-2:~$ mce_run test_1935 1 1
RUNFILE_NAME=/data/cryo/current_data//test_1935.run
FRAME_BASENAME=/data/cryo/current_data//test_1935

This hangs, so we Ctrl-C it.

We confirm that the DSP received a single frame from the MCE:

mhasse@mce-ubc-2:~$ pciread x 2
0x1

Now inspect the data buffer:

pcidump y 0x0 4096 > dump.txt

(This can take a while.) Inspecting the output we see reasonable data up to address 0x200:

...
y    0x1fd 0x00ffff
y    0x1fe 0x00f38d
y    0x1ff 0x00ffff
y    0x200 0x123456
y    0x201 0x123456
y    0x202 0x123456
...

Because the FO fifo can hold 1024 16-bit words, what we have here is a complete half-fifo. The PCI card is likely blocking, waiting for the FIFO to be half-full again so it can empty it efficiently. It is at this point that we suspect that the packet size might be the problem.

Our RC1, 41-row frame should have size 372 DWORDS = 1488 bytes. We check that this is what the DSP expects (QT_FRAME_SIZE):

mhasse@mce-ubc-2:~$ pciread x 0x40
0x5d0

That's 372. Good. Compare with the size of the most recent packet (PACKET_SIZE):

mhasse@mce-ubc-2:~$ pciread x 0x23
0x54c

Thats 1356 DWORDS = 5424 bytes! That's the size of a full (4-RC, 41 row) frame.

However, the DSP has only buffered 0x200 * 2 = 1024 bytes before stalling. Therefore it appears that the MCE is reporting a 5424 byte packet but then sending somewhere between 1024 and 2047 bytes of data.