-
Notifications
You must be signed in to change notification settings - Fork 3
CRAFT Spring '15
Jan Veverka edited this page Apr 2, 2015
·
16 revisions
19 March 2015 - 1 April 2015
- Transfers delayed by files greater than 100 GB taken on Friday evening and transfer system is having hard time to catch up https://hypernews.cern.ch/HyperNews/CMS/get/smops/823.html
- The PD-L1 rates are delayed for some runs: http://cmsonline.cern.ch/cms-elog/846019 and http://cmsonline.cern.ch/cms-elog/846106
- The transfer machine srv-c2c07-16 seems to be in a bad shape. There are signs of rfcp having errors allocating pages using
dmesg. - Reported by Federico de G.: Recent runs like 238799 are appearing (even with some delay) on our ramdisk, but not on the output disk at:
/fff/output/transfer/the last one I see there is: 238649 - Rebooted
srv-c2c07-16on March 23 around 17:30 during the transfers of the run 238752:- The ongoing transfers failed because of wrong sequence of manual interventions. The correct sequence is:
service sm stop; umount /store/lustre - After the reboot, one has to mount lustre and the DQM by hand:
mount 10.180.9.17:/fff/output /dqmburam
mount /bin/mount -t lustre -o flock 10.180.7.44@tcp0,10.180.7.45@tcp0:/cmsfs /store/lustre/
- The ongoing transfers failed because of wrong sequence of manual interventions. The correct sequence is:
- Copy worker crashed at 13:52 on Tuesday 24 March, restarted 15:29, as a result 238832 has some missing files.
- Inject worker terminated at 03:25:02, restarted transfers at 10:44:40, caught up with data taking by 16:00.