Bulk reads can overwhelm the I/O interrupt callback queue and cause values to fall behind indefinitely

I am having an issue with bulk reads. When the system becomes performance constrained, (large amounts of parameters to process in bulk reads or when the ioc server is running slow) we are running into cases where the I/O interrupts produced by the bulk read thread can overwhelm the I/O interrupt callbacks, causing some to appear to be dropped. 

Disclaimer, I know I am not running the latest version of the community module of twincat-ads, as we have a forked version at SLAC with some changes. However, I have reviewed the bulkReadThread and adsParameterUpdate portions and they appear to be almost exactly the same as what we have. So I am wondering if this issue is seen by other members of the community and if there might be any suggestions for how to handle this case.

With what we are seeing, when parameters are dropped, the values reported by the ioc remain behind indefinitely. For example, say a value is updating at a rate of 1Hz in TwinCAT and the last 5 values for a particular record are:

t = 0 -> value = 1.3 <-- oldest value
t = 1 -> value = 4.5
t = 2 -> value = 6.3
t = 3 -> value = 4.2
t = 4 -> value = 1.6 <-- latest value

In this case, once the ioc has fallen behind, the value reported when getting the value would be 4.2, for example.
Then, say the value stops updating at a rate of 1 Hz, and starts updating at 0.1 Hz. Now the ioc has enough processing bandwidth to catch up. However, when a new values comes in, say it is 4.8:


t = 0 -> value = 1.3 <-- oldest value
t = 1 -> value = 4.5
t = 2 -> value = 6.3
t = 3 -> value = 4.2
t = 4 -> value = 1.6
t = 14 -> value = 4.8 <-- latest value

At this point, the value reported updates but it becomes 1.6. So we are able to process I/O interrupts in time now, since new data is only coming at 0.1 Hz, but we now indefinitely show an old value.

I am able to reproduce this bug by taking the following steps:

1. I have a TwinCAT project with 260 variables that I create SCAN: I/O Intr records for.
2. In the TwinCAT project, these values are updated at a rate of 1s to a random value, but I can change the update rate by modifying the PT settings of a TON timer.
3. I go into the st.cmd file and change the asynSetTraceMask to 85, to make it do lots of prints. Doing enormous amounts of prints slows down the ioc a lot, so it is an easy way to simulate a heavily loaded ioc.
```
## Asyn/ADS diagnostics configuration (always loaded)
#define ASYN_TRACE_ERROR     0x0001
#define ASYN_TRACEIO_DEVICE  0x0002
#define ASYN_TRACEIO_FILTER  0x0004
#define ASYN_TRACEIO_DRIVER  0x0008
#define ASYN_TRACE_FLOW      0x0010
#define ASYN_TRACE_WARNING   0x0020
#define ASYN_TRACE_INFO      0x0040
asynSetTraceMask("$(ASYN_PORT)", -1, 85)
```
4. I start up the ioc, and login to the PLC project in TwinCAT.
5. I start monitoring the values reported by the twincat ads ioc and compare the current reported values to the live values I see in the PLC login session.
6. Once I see that the ioc reported values are lagging significantly, I change the TON timer in TwinCAT to change the value only every 30 seconds.
7. Now, I observe that the values have fallen behind and they never catch up.

I am wondering if there is some way in EPICS to ensure that all of the previous I/O interrupt callbacks from a prior bulk read have been processed before I allow a new bulk read to start. I would rather have the bulk read run slower than desired than to indefinitely show past values. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk reads can overwhelm the I/O interrupt callback queue and cause values to fall behind indefinitely #36

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bulk reads can overwhelm the I/O interrupt callback queue and cause values to fall behind indefinitely #36

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions