STOP Command

From MCEWiki

Background

The stop command was invented to allow users to stop data acquisitions in mid-stream. There are a variety of reasons for wanting to do so:

  • Malfunction of other subsystems at the telescope
  • Not receiving any DV pulses from the Sync Box or other triggering software
  • Closing off a long data acquisition
  • A hang of the Clock Card firmware

How to issue a STOP command

From a MAS shell:

> mce_run mce_run_1042 10000 s &
> sleep 2
> mce_cmd -x stop rcs ret_dat
> sleep 1

In mce_cmd interactive mode, the stop command can be issued as:

> stop <card_addr> ret_dat

In order to stop the MAS data process only (from a shell):

> mce_cmd -x fakestop
> mce_cmd -x mce_reset
> mce_cmd -x dsp_reset
> mce_cmd -x acq_flush

If that doesn't work, try unloading and reloading the PCI driver.

How does the MCE handle a STOP command

The STOP command is supported as a special command in the Clock Card firmware. Unlike for WB and RB commands, the MCE replies to the STOP command at it's leisure, and not necessarily in order with data packets being returned.

Data packets continue to be returned following the reply to the STOP command until all of the remaining ret_dat commands are flushed from the MCE. This means that either one or two data packets are returned following the receipt of a STOP command by the Clock Card. The last data packet has the 'stop' and 'last_frame' bits set in the status frame header. With MAS, a certain amount of dead-time is required between the reply to the STOP command and the next frame of data. This dead-time is hard-coded as 10ms in the Clock Card firmware. With a delay of 1ms, the PCI card was not be ready to receive the final data packet in 50% of STOP trials. The delay can be adjusted by using the 'stop_dly' command. The units for this command are in us.

When a STOP command is issued outside of a data run, no data packets are returned. When a STOP command is issued during a data run, the timing of the last data frame does not generally follow the timing that is specified by the '> rb cc data_rate' parameter. In general, the last ret_dat is queued up as quickly as possible, irrespective of the status of '> rb cc use_dv'. For example, when the Clock Card is sourcing its DV pulses from the Sync Box, and a STOP command arrives, it does not wait for the next DV pulse -- instead it issues the last ret_dat immediately. If the Clock Card waited, it would hang if the reason for the STOP was because the source of the DV pulses was not functioning correctly to begin with.

Test Cases

The cmd_translator block on the clock card is the block that nominally runs data acquisitions. It is a complicated piece of code, and requires simulation of at least the following cases:

  • Acquisition of one frame of data
  • Acquisition of multiple frames of data
  • Acquisition while sourcing the DV from the Sync Box (use_sync=2, use_dv=2, select_clk=1)
  • Acquisition while sourcing the DV from the Sync Box's input (use_sync=2, use_dv=2, select_clk=1)
  • Acquisition while sourcing the DV from the Sync Box and disconnecting the Sync Box fibre.
  • Acquisition while sourcing the DV from the Sync Box with the fibre initially disconnected
  • Acquisition while turing the Sync Box output off and then on.


All the cases above should be repeated in the following scenarios:

  • a STOP command should be issued before the first frame is returned
  • a STOP command should be issued during the acquisition
  • a STOP command should be issued after the acquisition

Note: During the testing of STOP commands in the sys_v05000000 tag of firmware, it was found that whenever a malfunction with stopping occurred, the Clock Card had been in the process of sending a data packet to the PCI card when a STOP command was issued by the PCI Card. Further investigation revealed that the PCI Card required an inordinate amount of time to process the reply to the STOP command, which caused an overflow in the PCI Card buffer space. By making changes to both the PCI Firmware and Linux Driver, we were able to increase the STOP Reply processing bandwidth to a level where STOP and On-The-Fly errors no longer occurred.