Ansible allows much more control over the execution of the playbook by running the tasks in parallel on all hosts. By default, Ansible only fork up to five times, so it will run a particular task on five different machines at once.
This value is set in the Ansible configuration file ansible.cfg
.
1 2 |
[root@controlnode ansible]# grep forks ansible.cfg #forks = 5 |
When there are a large number of managed hosts (more than five), the forks parameter can be changed to something more suitable for the environment. The default value can be either overridden in the configuration file or the value can be changed using the --forks
option for the ansible-playbook or ansible commands.
Running tasks in parallel
For any specific play, you can use the serial
keyword in a playbook to temporarily reduce the number of machines running in parallel from the fork count specified in the Ansible configuration file. The serial keyword is primarily used to control rolling updates.
Rolling updates
If there is a website being deployed on 100 web servers, only 10 of them should be updated at the same time. The serial key can be set to 10 in the playbook to reduce the number of simultaneous deployments (assuming that the fork key was set to something higher). The serial keyword can also be specified as a percentage which will be applied to the total number of hosts in the play. If the number of hosts does not divide equally into the number of passes, the final pass will contain the modulus. Regardless of the percentage, the number of hosts per pass will always be 1 or greater.
1 2 3 4 |
--- - name: Limit the number of hosts this play runs on at the same time hosts: appservers serial: 2 |
Ansible, regardless of the number of forks set, only spins up the tasks based on the current number of hosts in a play.
Asynchronous tasks
There are some system operations that take a while to complete. For example, when downloading a large file or rebooting a server, such tasks takes a long time to complete. Using parallelism and forks, Ansible starts the command quickly on the managed hosts, then polls the hosts for status
until they are all finished.
To run an operation in parallel, use the async
and poll
keywords. The async
keyword triggers Ansible to run the job in the background and can be checked later, and its value will be the maximum time that Ansible will wait for the command to complete. The value of poll
indicates to Ansible how often to poll to check if the command has been completed. The default poll value is 10 seconds.
In the example, the get_url
module takes a long time to download a file and async: 3600
instructs Ansible to wait for 3600
seconds to complete the task and poll: 10
is the polling time in seconds to check if the download is complete.
1 2 3 4 5 6 7 8 9 |
--- - name: Long running task hosts: demoservers remote_user: devops tasks: - name: Download big file get_url: url=http://demo.example.com/bigfile.tar.gz async: 3600 poll: 10 |
Deferring asynchronous tasks
Long running operations or maintenance scripts can be carried out with other tasks, whereas checks for completion can be deferred until later using the wait_for
module. To configure Ansible to not wait for the job to complete, set the value of poll to 0 so that Ansible starts the command and instead of polling for its completion it moves to the next tasks.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
--- - name: Restart and wait until the server is rebooted hosts: demoservers remote_user: devops tasks: - name: restart machine shell: sleep 2 && shutdown -r now "Ansible updates triggered" async: 1 poll: 0 become: true ignore_errors: true - name: waiting for server to come back local_action: wait_for: host: "{{ inventory_hostname }}" state: started delay: 30 timeout: 300 become: false |
Note
The shorthand syntax for delegating to 127.0.0.1, can be also be written by use of the local_action
keyword.
For the running tasks that take an extremely long time to run, you can configure Ansible to wait for the job as long as it takes. To do this, set the value of async
to 0
.
Asynchronous task status
While an asynchronous task is running, you can also check its completion status by using Ansible async_status
module. The module requires the job or task identifier as its parameter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
--- # Async status - fire-forget.yml - name: Async status with fire and forget task hosts: demoservers remote_user: devops become: true tasks: - name: Download big file get_url: url: http://mirror-pl.kielcetechnologypark.net/centos/7.8.2003/isos/x86_64/CentOS-7-x86_64-Everything-2003.iso dest: /tmp async: 3600 poll: 0 register: download_sleeper - name: Wait for download to finish async_status: "jid={{ download_sleeper.ansible_job_id }}" register: job_result until: job_result.finished retries: 30 |
The output of playbook when executed:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
[miro@controlnode paralellism]$ ansible-playbook async_status.yml PLAY [Async status with fire and forget task] ******************************************************************************************************* TASK [Gathering Facts] ****************************************************************************************************************************** ok: [localhost] TASK [Download big file] **************************************************************************************************************************** changed: [localhost] TASK [Wait for download to finish] ****************************************************************************************************************** FAILED - RETRYING: Wait for download to finish (30 retries left). FAILED - RETRYING: Wait for download to finish (29 retries left). FAILED - RETRYING: Wait for download to finish (28 retries left). FAILED - RETRYING: Wait for download to finish (27 retries left). fatal: [localhost]: FAILED! => {"ansible_job_id": "217519210774.5438", "attempts": 5, "changed": false, "dest": "/tmp", "finished": 1, "gid": 0, "group": "root", "mode": "01777", "msg": "Request failed: <urlopen error [Errno -2] Ta nazwa lub usługa jest nieznana>", "owner": "root", "secontext": "system_u:object_r:tmp_t:s0", "size": 4096, "state": "directory", "uid": 0, "url": "http://mirror-pl.kielcetechnologypark.net/centos/7.8.2003/isos/x86_64/CentOS-7-x86_64-Everything-2003.iso"} to retry, use: --limit @/home/miro/ansible/paralellism/async_status.retry PLAY RECAP ****************************************************************************************************************************************** localhost : ok=2 changed=1 unreachable=0 failed=1 |
Example
In this exercise, you will run a playbook which uses a script to performs a long-running process on using an asynchronous task. Instead of waiting for the task to get completed, will check the status using the async_status
module.
Create a script file named longfiles.j2
under the ~/paralellism/templates
directory, with the following content.
1 2 3 4 5 6 7 8 |
#!/bin/bash echo "emptying $2" > $2 for i in {00..30}; do echo "run $i, $1" echo "run $i for $1" >> $2 sleep 1 done |
In the playbook file async.yml
, define a tasks to:
Copy the longfiles.j2
script from templates directory to the managed host under /usr/local/bin/longfiles
. Change the file and group ownership to root, and change permission of the script to 0755.
Define a task and use the async
keyword to:
Run the longfiles file copied previously under /usr/local/bin/longfiles
with arguments foo, bar, and baz and the their corresponding output files /tmp/foo.file
, /tmp/bar.file
and /tmp/baz.file
respectively. For example: /usr/local/bin/longfiles foo /tmp/foo.file
. Use the async
keyword to wait the for the task until 110 second and set the value of poll to 0 so that Ansible starts the command then runs in the background. Register a script_sleeper
variable to store the completion status of the command started.
Define a task to switch on the debug mode to see the value stored in the variable script_sleeper
. Define a task in the playbook that will keep checking the status of the async task:
• Use the async_status
module to check the status of async task triggered previously using the variable script_sleeper.result
.
• Set the maximum retries to 30.
The completed async.yml
playbook should have the following content:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
[miro@controlnode paralellism]$ cat async.yml # async.yml - name: longfiles async playbook hosts: localhost remote_user: miro become: true tasks: - name: template longfiles script template: src: templates/longfiles.j2 dest: /usr/local/bin/longfiles owner: root group: root mode: 0755 - name: run longfiles script command: "/usr/local/bin/longfiles {{ item }} /tmp/{{ item }}.file" async: 110 poll: 0 with_items: - foo - bar - baz register: script_sleeper - name: show script_sleeper value debug: var: script_sleeper - name: check status of longfiles script async_status: "jid={{ item.ansible_job_id }}" register: job_result until: job_result.finished retries: 30 with_items: "{{ script_sleeper.results }}" |
Check the syntax of the async.yml
playbook. Correct any errors that you find.
1 2 3 |
[miro@controlnode paralellism]$ ansible-playbook --syntax-check async.yml playbook: async.yml |
Run the playbook async.ym
l.
Observe the job ids listed as ansible_job_id
to each job running, that were run in parallel using the async
keyword. The task check status of longfiles
script retries to see if the started jobs are complete using the async_status
Ansible module.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
[miro@controlnode paralellism]$ ansible-playbook async.yml PLAY [longfiles async playbook] ********************************************************************************************************************* TASK [Gathering Facts] ****************************************************************************************************************************** ok: [localhost] TASK [template longfiles script] ******************************************************************************************************************** changed: [localhost] TASK [run longfiles script] ************************************************************************************************************************* changed: [localhost] => (item=foo) changed: [localhost] => (item=bar) changed: [localhost] => (item=baz) TASK [show script_sleeper value] ******************************************************************************************************************** ok: [localhost] => { "script_sleeper": { "changed": true, "msg": "All items completed", "results": [ { "_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "ansible_job_id": "115147576181.5785", "changed": true, "failed": false, "finished": 0, "item": "foo", "results_file": "/root/.ansible_async/115147576181.5785", "started": 1 }, { "_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "ansible_job_id": "817627902564.5810", "changed": true, "failed": false, "finished": 0, "item": "bar", "results_file": "/root/.ansible_async/817627902564.5810", "started": 1 }, { "_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "ansible_job_id": "64020630152.5835", "changed": true, "failed": false, "finished": 0, "item": "baz", "results_file": "/root/.ansible_async/64020630152.5835", "started": 1 } ] } } TASK [check status of longfiles script] ************************************************************************************************************* FAILED - RETRYING: check status of longfiles script (30 retries left). FAILED - RETRYING: check status of longfiles script (29 retries left). FAILED - RETRYING: check status of longfiles script (28 retries left). FAILED - RETRYING: check status of longfiles script (27 retries left). FAILED - RETRYING: check status of longfiles script (26 retries left). FAILED - RETRYING: check status of longfiles script (25 retries left). changed: [localhost] => (item={'_ansible_parsed': True, '_ansible_item_result': True, '_ansible_no_log': False, u'ansible_job_id': u'115147576181.5785', 'failed': False, u'started': 1, 'changed': True, 'item': u'foo', u'finished': 0, u'results_file': u'/root/.ansible_async/115147576181.5785', '_ansible_ignore_errors': None}) changed: [localhost] => (item={'_ansible_parsed': True, '_ansible_item_result': True, '_ansible_no_log': False, u'ansible_job_id': u'817627902564.5810', 'failed': False, u'started': 1, 'changed': True, 'item': u'bar', u'finished': 0, u'results_file': u'/root/.ansible_async/817627902564.5810', '_ansible_ignore_errors': None}) changed: [localhost] => (item={'_ansible_parsed': True, '_ansible_item_result': True, '_ansible_no_log': False, u'ansible_job_id': u'64020630152.5835', 'failed': False, u'started': 1, 'changed': True, 'item': u'baz', u'finished': 0, u'results_file': u'/root/.ansible_async/64020630152.5835', '_ansible_ignore_errors': None}) PLAY RECAP ****************************************************************************************************************************************** localhost : ok=5 changed=3 unreachable=0 failed=0 |
Example 2.
In this lab, you will deploy an upgraded web page to add a new feature using the serial
keyword for rolling updates, on two web servers serverb.lab.example.com and tower.lab.example.com running behind a load balancer. The HAProxy load balancer and the web servers are preconfigured on workstation.lab.example.com, serverb.lab.example.com, and tower.lab.example.com respectively. After the upgrade of the web page, the web servers needs to be rebooted one at a time before adding them back to load balancer pool, without affecting the site availability using delegation.
Create a playbook named upgrade_webserver.yml
. Use the hosts that are part of the webservers inventory group and use privilege escalation using remote user devops. As the updates needs to pushed to one
server at a time, use the serial
keyword with its value as 1
.
Create a task to remove the web server from the load balancer pool. Use the haproxy Ansible module to remove from the haproxy load balancer. The task needs to delegated to server from the [lbserver] inventory group. The haproxy module is used to disable a back end server from HAProxy using socket commands. To disable a back end server from the back end pool named app, socket path as /var/lib/haproxy/stats, and wait=yes to wait until the server reports a status of maintenance, use the following:
1 |
haproxy: state=disabled backend=app host={{ inventory_hostname }} socket=/var/lib/haproxy/stats wait=yes |
Create a task to copy the updated page template from the directory templates/index-ver1.html.j2
to the web server’s document root as /var/www/html/index.html
. Also register a variable, pageupgrade, which will be used later to invoke other tasks.
Create a task to restart the web servers using the command
module. Use an asynchronous task that will not wait more than 1 second for it to complete. The task should not be polled for completion. Set poll to 0 to disable polling. Use ignore_errors
as true
and execute the task if the previously registered pageupgrade variable has changed.
Create a task to wait for the web server to be rebooted. Use the wait_for
module to wait for the web server to be rebooted and specify the host as inventory_hostname, port as 22, state as started, delay as 25, and timeout as 200. Use delegation to delegate the task to the local machine. The task should be executed when the variable pageupgrade has changed. Privilege escalation is not required for this task.
Create a task to wait for the web server port to be started. As in the preconfigured web server, the httpd service is configured to start at boot time. Specify the host as inventory_hostname
, port
as 80
, state as started
, and timeout
as 20
.
Create a final task to add the web server after the upgrade of the web page back to the load balancer pool. Use the haproxy Ansible module to add the server back to the HAProxy load balancer pool. The task needs to be delegated to all members of the [lbserver] inventory group. The haproxy module is used to enable a back end server from HAProxy using socket commands. To enable a back end server from the back end pool named app, socket path as /var/lib/haproxy/stats
, and wait=yes
to wait until the server reports healthy status, use the following:
1 2 |
haproxy: state=enabled backend=app host={{inventory_hostname }} socket=/var/lib/haproxy/stats wait=yes |
The following should be the contents of upgrade-webserver.yml:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
--- - name: Upgrade Webservers hosts: webservers remote_user: devops become: yes serial: 1 tasks: - name: disable the server in haproxy haproxy: state: disabled backend: app host: "{{ inventory_hostname }}" socket: /var/lib/haproxy/stats wait: yes delegate_to: "{{ item }}" with_items: "{{ groups.lbserver }}" - name: upgrade the page template: src: "templates/index-ver1.html.j2" dest: "/var/www/html/index.html" register: pageupgrade - name: restart machine command: shutdown -r now "Ansible updates triggered" async: 1 poll: 0 ignore_errors: true when: pageupgrade.changed - name: wait for webserver to restart wait_for: host: "{{ inventory_hostname }}" port: 22 state: started delay: 25 timeout: 200 become: False delegate_to: 127.0.0.1 when: pageupgrade.changed - name: wait for webserver to come up wait_for: host: "{{ inventory_hostname }}" port: 80 state: started timeout: 20 - name: enable the server in haproxy haproxy: state: enabled backend: app host: "{{ inventory_hostname }}" socket: /var/lib/haproxy/stats wait: yes delegate_to: "{{ item }}" with_items: "{{ groups.lbserver }}" |