Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions doc/production.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@
Production
**********

Unlike the development environment, that uses Vagrant pre-configured virtual
machines, when dealing with production machines, you have to perform some
preliminary tasks in order for the provisioning procedure to be completed
successfully. It is required that you configure the to-be-provisioned
machines' network interfaces, as well as their disk partitions. You also have
to install on them the desired Operating System (Centos 6.8 for ACS running
machines, Centos 7.2 for storage). Without these preliminary tasks, the
provisioning procedure will most likely fail.

Machines deployment
===================
To deploy the system in production, you have to specify a *cluster* of machines,
Expand Down Expand Up @@ -74,3 +83,51 @@ tag you want to install on the machines:
argument from both the ``discos-deploy`` and ``discos-get`` scripts. If you
pass the ``--station`` argument anyway, if the given argument does not match
the correct station you will receive an error and the procedure will stop.

Replace the Manager in case of failure
--------------------------------------
In case the Manager machine suffers a failure of some sort, it has to be
replaced. In order to do this, the first thing to do is, perform the
provisioning procedure on a newly installed machine (after putting the new
Manager's IP address in the Ansible inventory's hosts file). In order
for the whole system to behave correctly it is also necessary to perform
some manual tweaking on the other DISCOS machines as well (in case the
DISCOS control system is running on a distributed environment. This is the
case for the SRT and Medicina stations).

The tweaks to be performed in order for the DISCOS control system to work as
expected are the following:

- Replace the old ACS Manager IP address reference with the new one in
``/discos-sw/config/misc/bash_profile`` file in the ``discos-console``
machine. It is stored as an environment variable called ``MNG_IP``.
- Replace the old Manager IP address with the new one in some fiels in the
DISCOS CDB. More specifically, one file has to be corrected in order for the
control system to be able to properly communicate with the ``TotalPower``
backend, you can find this file in the repository of the currently deployed
released of DISCOS, under the directory
``SRT/Configuration/CDB/alma/BACKENDS/TotalPower/TotalPower.xml``.
The variable to be corrected is called ``DataIPAddress``. This has to be
performed on the new Manager machine itself before launching the control
system.
- Make sure that all the station systems and machines accept incoming
connections from the newly allocated Manager's IP address. Specifically, the
``TotalPower`` backend and the ``CalMux`` machines have to be tweaked in
order to allow them to be controlled by the new manager.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the procedure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This procedure involves logging in the said machines as root, if it has to be documented, this is not the place to do it. A suggestion about this is we perform this step in advance by allowing a range of addresses to control the said machines, so, in case of failure, this step can be skipped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No clear to me how it is possible to replicate the manager without any information about this point. I think the procedure should be documented somewhere, and in case this is not the place, here we have to put a reference link to it.


In order for the whole environment to work properly is also necessary to
perform some other tweaks on the other DISCOS machines, but not related to
the control system itself:

- Replace the old Manager IP address with the new one in ``/etc/hosts`` file in
``discos-console`` and ``discos-storage`` machines (in case the DISCOS
control software is running on a distributed environment). This will allow
other services such as the Lustre service on the ``discos-storage`` machine
to point again to the correct IP address.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a procedure to point to?

- Perform the ssh key exchange procedure between the ``discos`` user of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Mauro do all this things? :-D We need an example for him :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a procedure that a generic observer can do. Performing the ssh key exchange requires knowing the password of both the discos and the root users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was joking, the point is that we have to write the documentation thinking that the reader is not one of the discos team...

newly installed Manager with the ones present on the ``discos-console`` and
``discos-storage`` machines. The same procedure has to be performed between
the ``root`` users as well. This will allow some scripts such as the Lustre
service on the ``discos-storage`` machine and the ``discos-addProject`` and
``discos-removeProject`` on the ``discos-console`` machine to perform some
remote tasks that would be impossible to be performed otherwise.