Add Ansible playbook and roles to configure the host and deploy Koina server#194
Add Ansible playbook and roles to configure the host and deploy Koina server#194sanjaysrikakulam wants to merge 2 commits intowilhelm-lab:mainfrom
Conversation
There was a problem hiding this comment.
Can you delete this file?
There was a problem hiding this comment.
Yes, this PR deletes that file.
| - name: Enable UFW | ||
| community.general.ufw: | ||
| state: enabled | ||
|
|
||
| - name: Allow SSH (port 22) | ||
| community.general.ufw: | ||
| rule: allow | ||
| port: '22' | ||
| proto: tcp | ||
|
|
||
| - name: Allow HTTP (port 80) | ||
| community.general.ufw: | ||
| rule: allow | ||
| port: '80' | ||
| proto: tcp | ||
|
|
||
| - name: Allow HTTPS (port 443) | ||
| community.general.ufw: | ||
| rule: allow | ||
| port: '443' | ||
| proto: tcp | ||
|
|
||
| - name: Allow all outgoing traffic | ||
| community.general.ufw: | ||
| default: allow | ||
| direction: outgoing | ||
|
|
||
| - name: Deny all other incoming traffic | ||
| community.general.ufw: | ||
| default: deny | ||
| direction: incoming |
There was a problem hiding this comment.
My experience with Ansible is limited. But would it make sense to group this in a role as well?
There was a problem hiding this comment.
Not sure what you mean. Would you like all of the fiewall rules and changes as a specific role?
There was a problem hiding this comment.
I have created a new role and moved these tasks to that role.
| - name: Install Nvidia driver for GPU # This is not idempotent (need to re-work this). | ||
| ansible.builtin.command: ubuntu-drivers install --gpgpu | ||
| register: nvidia_install | ||
| changed_when: "'installed' in nvidia_install.stdout" | ||
|
|
||
| - name: Detect installed Nvidia server driver versions | ||
| ansible.builtin.command: bash -c "dpkg -l | awk '/nvidia-compute-utils-[0-9]+-server/{print $2}' | sort -V | tail -n 1" | ||
| register: nvidia_driver_pkg | ||
| changed_when: false | ||
|
|
||
| - name: Extract Nvidia driver version number | ||
| ansible.builtin.set_fact: | ||
| nvidia_driver_version: "{{ nvidia_driver_pkg.stdout | regex_search('[0-9]+') }}" | ||
|
|
||
| - name: Install matching Nvidia server-utils package | ||
| ansible.builtin.apt: | ||
| name: "nvidia-utils-{{ nvidia_driver_version }}-server" | ||
| state: present | ||
| update_cache: yes | ||
| when: nvidia_driver_version is defined and nvidia_driver_version | length > 0 | ||
|
|
||
| - name: Reboot if Nvidia driver was installed # This is not idempotent due to the driver install step above (so reboot always runs :( ). | ||
| ansible.builtin.reboot: | ||
| msg: "Rebooting after Nvidia driver installation" | ||
| pre_reboot_delay: 10 | ||
| when: "'installed' in nvidia_install.stdout" | ||
|
|
||
| - name: Check if nvidia-smi works | ||
| ansible.builtin.command: nvidia-smi | ||
| register: nvidia_smi | ||
| failed_when: nvidia_smi.rc != 0 | ||
| changed_when: false |
There was a problem hiding this comment.
Same comment as above, all of these steps could be nicely bundled in a role.
There was a problem hiding this comment.
I have created a new role and moved these tasks to that role.
| - role: geerlingguy.docker | ||
| vars: | ||
| docker_users: | ||
| - 'ubuntu' |
There was a problem hiding this comment.
Would this fail if we use a different user for ansible_ssh_user?
There was a problem hiding this comment.
Yes. I can update the readme so the users in the future using the playbook can update the value of the username accordingly.
There was a problem hiding this comment.
I have updated the readme as well as moved the value into a variable and added comments to make it clear.
| # - name: Ensure Koina server is running by curl health endpoint | ||
| # ansible.builtin.uri: | ||
| # url: "http://localhost:{{ koinahttp_docker_port }}/v2/health/ready" | ||
| # method: GET | ||
| # return_content: true | ||
| # status_code: 200 | ||
| # register: koina_health_check | ||
| # retries: 3 | ||
| # delay: 10 | ||
| # until: koina_health_check.status == 200 |
There was a problem hiding this comment.
Is there a reason this is included as comments?
There was a problem hiding this comment.
Because the server startup and the loading of all the models takes more than 15 to 20 minutes, hence checking if this is ready via the task is not feasible, so I commented it out.
This PR adds the Ansible playbook and roles to configure and deploy the Koina server.