Skip to content
Zeynep Demiragli edited this page Aug 3, 2015 · 36 revisions

To-Do

  • Private Data Transfers
  • Write tools to make fixing the bookkeeping easier.
  • Transfer service should be updated to respond to the term signal more elegantly
  • Implement the code necessary to split the folders to limit the number of files in a given run both for mergers and the transfer
  • For files in the bad area, move also the lock files. Here is an example:
/store/lustre/mergeMacro/run238685/bad/run238685_ls0111_streamNanoDST_StorageManager.lock
/store/lustre/transfer/run238685/bad/run238685_ls0111_streamNanoDST_StorageManager.dat
/store/lustre/transfer/run238685/bad/run238685_ls0111_streamNanoDST_StorageManager.jsn
  • Put the 60s cool off time before bookkeeper call back in eor.py. It was removed in the commit 5eb315
  • Update the filename of the log file used for inject worker notifications to follow the pattern <date>-<hostname>-<instance>.log, see https://github.com/smpro/transfer-scripts/blob/master/inject/compat/closeFile.pl#L110
  • Make sure that the stream rates are filled in WBM even for runs with the TIER0_TRANSFER_OFF run key(?).
  • Use WatchedFileHandler + logrotate for log-file rotation: http://stackoverflow.com/questions/10235220/python-logging-logrotate-options
  • Clean successfully transferred and handled files. There is a clean up crone that exists in the old production, which also updates the database and marks them as deleted.
  • Python / Perl transfer RPM and configs in Puppet
  • For eor, remove the time out (minutes to hours) since the last added metadata JSON before brute-force closing.
  • DQM delivery from merger to transfers
    • Update streams_to_dqm in smhookd.conf
    • Update the dqm path in smhookd.conf
    • Update the /dqmburam mount definition to point to RAM /fff/input
  • Update /etc/init.d/functions-storagemanager to check for stale lock files during service start.
  • Copying the Event Display to nfs and to eos areas.
  • Using srv to transfer old minidaq runs with
    • with Minidaq setup label. Setup label is a configuration parameter.
  • Implement some more functionality in eos file info : - calculate adler checksum after copying if the jsn checksum and the destination checksum don't match. - if checksums are not the same print out "FAIL" and retry just once ...
  • Log a WARNING if the trigger rate latency for WBM is too long (> 20s ?).
  • Castor -> EOS
  • Only transfer runs with some data
  • Update the view of SM instances so that it works also for mrg* machines in addition to srv-c2c* machines:
  • Separate the transfers for DQM from the rest of the streams to achieve low latency
  • Require that the number of MiniEoR files is the same as the number of ini files in macroeor
  • Veto eventsRunTotal < eventsInputStreams in macroeor
  • Use Python standard library for transfer logging
  • Honor transfer flag
  • Exclude runs with run number > 10^9
  • Reduce logging verbosity of watchAndInject

Clone this wiki locally