threw 33 million 1k docs at a fairly simple ingest and the threads for our steps don't seem to be doing much

There is no thread that seems to be pegged, but several cassandra related threads are busier than any of ours:

One possible place to look for performance is to avoid the index on the status column