diff --git a/doc/production.rst b/doc/production.rst index 7b7df866..8d272a1f 100644 --- a/doc/production.rst +++ b/doc/production.rst @@ -4,6 +4,15 @@ Production ********** +Unlike the development environment, that uses Vagrant pre-configured virtual +machines, when dealing with production machines, you have to perform some +preliminary tasks in order for the provisioning procedure to be completed +successfully. It is required that you configure the to-be-provisioned +machines' network interfaces, as well as their disk partitions. You also have +to install on them the desired Operating System (Centos 6.8 for ACS running +machines, Centos 7.2 for storage). Without these preliminary tasks, the +provisioning procedure will most likely fail. + Machines deployment =================== To deploy the system in production, you have to specify a *cluster* of machines, @@ -74,3 +83,51 @@ tag you want to install on the machines: argument from both the ``discos-deploy`` and ``discos-get`` scripts. If you pass the ``--station`` argument anyway, if the given argument does not match the correct station you will receive an error and the procedure will stop. + +Replace the Manager in case of failure +-------------------------------------- +In case the Manager machine suffers a failure of some sort, it has to be +replaced. In order to do this, the first thing to do is, perform the +provisioning procedure on a newly installed machine (after putting the new +Manager's IP address in the Ansible inventory's hosts file). In order +for the whole system to behave correctly it is also necessary to perform +some manual tweaking on the other DISCOS machines as well (in case the +DISCOS control system is running on a distributed environment. This is the +case for the SRT and Medicina stations). + +The tweaks to be performed in order for the DISCOS control system to work as +expected are the following: + +- Replace the old ACS Manager IP address reference with the new one in + ``/discos-sw/config/misc/bash_profile`` file in the ``discos-console`` + machine. It is stored as an environment variable called ``MNG_IP``. +- Replace the old Manager IP address with the new one in some fiels in the + DISCOS CDB. More specifically, one file has to be corrected in order for the + control system to be able to properly communicate with the ``TotalPower`` + backend, you can find this file in the repository of the currently deployed + released of DISCOS, under the directory + ``SRT/Configuration/CDB/alma/BACKENDS/TotalPower/TotalPower.xml``. + The variable to be corrected is called ``DataIPAddress``. This has to be + performed on the new Manager machine itself before launching the control + system. +- Make sure that all the station systems and machines accept incoming + connections from the newly allocated Manager's IP address. Specifically, the + ``TotalPower`` backend and the ``CalMux`` machines have to be tweaked in + order to allow them to be controlled by the new manager. + +In order for the whole environment to work properly is also necessary to +perform some other tweaks on the other DISCOS machines, but not related to +the control system itself: + +- Replace the old Manager IP address with the new one in ``/etc/hosts`` file in + ``discos-console`` and ``discos-storage`` machines (in case the DISCOS + control software is running on a distributed environment). This will allow + other services such as the Lustre service on the ``discos-storage`` machine + to point again to the correct IP address. +- Perform the ssh key exchange procedure between the ``discos`` user of the + newly installed Manager with the ones present on the ``discos-console`` and + ``discos-storage`` machines. The same procedure has to be performed between + the ``root`` users as well. This will allow some scripts such as the Lustre + service on the ``discos-storage`` machine and the ``discos-addProject`` and + ``discos-removeProject`` on the ``discos-console`` machine to perform some + remote tasks that would be impossible to be performed otherwise.