Ansible Paralellism – miro.borodziuk.eu

Ansible allows much more control over the execution of the playbook by running the tasks in parallel on all hosts. By default, Ansible only fork up to five times, so it will run a particular task on five different machines at once.

This value is set in the Ansible configuration file ansible.cfg.

[root@controlnode ansible]# grep forks ansible.cfg
#forks = 5

1 2	[root@controlnode ansible]# grep forks ansible.cfg #forks = 5

When there are a large number of managed hosts (more than five), the forks parameter can be changed to something more suitable for the environment. The default value can be either overridden in the configuration file or the value can be changed using the --forks option for the ansible-playbook or ansible commands.

Running tasks in parallel
For any specific play, you can use the serial keyword in a playbook to temporarily reduce the number of machines running in parallel from the fork count specified in the Ansible configuration file. The serial keyword is primarily used to control rolling updates.

Rolling updates
If there is a website being deployed on 100 web servers, only 10 of them should be updated at the same time. The serial key can be set to 10 in the playbook to reduce the number of simultaneous deployments (assuming that the fork key was set to something higher). The serial keyword can also be specified as a percentage which will be applied to the total number of hosts in the play. If the number of hosts does not divide equally into the number of passes, the final pass will contain the modulus. Regardless of the percentage, the number of hosts per pass will always be 1 or greater.

---
- name: Limit the number of hosts this play runs on at the same time
  hosts: appservers
  serial: 2

---

- name: Limit the number of hosts this play runs on at the same time

hosts: appservers

serial: 2

Ansible, regardless of the number of forks set, only spins up the tasks based on the current number of hosts in a play.

Asynchronous tasks
There are some system operations that take a while to complete. For example, when downloading a large file or rebooting a server, such tasks takes a long time to complete. Using parallelism and forks, Ansible starts the command quickly on the managed hosts, then polls the hosts for status
until they are all finished.

To run an operation in parallel, use the async and poll keywords. The async keyword triggers Ansible to run the job in the background and can be checked later, and its value will be the maximum time that Ansible will wait for the command to complete. The value of poll indicates to Ansible how often to poll to check if the command has been completed. The default poll value is 10 seconds.

In the example, the get_url module takes a long time to download a file and async: 3600 instructs Ansible to wait for 3600 seconds to complete the task and poll: 10 is the polling time in seconds to check if the download is complete.

---
- name: Long running task
  hosts: demoservers
  remote_user: devops
  tasks:
    - name: Download big file
      get_url: url=http://demo.example.com/bigfile.tar.gz
      async: 3600
      poll: 10

---

- name: Long running task

hosts: demoservers

remote_user: devops

tasks:

- name: Download big file

get_url: url=http://demo.example.com/bigfile.tar.gz

async: 3600

poll: 10

Deferring asynchronous tasks
Long running operations or maintenance scripts can be carried out with other tasks, whereas checks for completion can be deferred until later using the wait_for module. To configure Ansible to not wait for the job to complete, set the value of poll to 0 so that Ansible starts the command and instead of polling for its completion it moves to the next tasks.

---  
- name: Restart and wait until the server is rebooted
  hosts: demoservers
  remote_user: devops
  tasks:
    - name: restart machine
      shell: sleep 2 && shutdown -r now "Ansible updates triggered"
      async: 1
      poll: 0
      become: true
      ignore_errors: true

    - name: waiting for server to come back
      local_action:
        wait_for:
          host: "{{ inventory_hostname }}"
          state: started
          delay: 30
          timeout: 300
      become: false

---

- name: Restart and wait until the server is rebooted

hosts: demoservers

remote_user: devops

tasks:

- name: restart machine

shell: sleep 2 && shutdown -r now "Ansible updates triggered"

async: 1

poll: 0

become: true

ignore_errors: true

- name: waiting for server to come back

local_action:

wait_for:

host: "{{ inventory_hostname }}"

state: started

delay: 30

timeout: 300

become: false

Note
The shorthand syntax for delegating to 127.0.0.1, can be also be written by use of the local_action keyword.

For the running tasks that take an extremely long time to run, you can configure Ansible to wait for the job as long as it takes. To do this, set the value of async to 0.

Asynchronous task status
While an asynchronous task is running, you can also check its completion status by using Ansible async_status module. The module requires the job or task identifier as its parameter.

---
# Async status - fire-forget.yml
- name: Async status with fire and forget task
  hosts: demoservers
  remote_user: devops
  become: true
  tasks:

    - name: Download big file
       get_url:
         url: http://mirror-pl.kielcetechnologypark.net/centos/7.8.2003/isos/x86_64/CentOS-7-x86_64-Everything-2003.iso
         dest: /tmp
      async: 3600
      poll: 0
      register: download_sleeper

    - name: Wait for download to finish
      async_status: "jid={{ download_sleeper.ansible_job_id }}"
      register: job_result
      until: job_result.finished
      retries: 30

---

# Async status - fire-forget.yml

- name: Async status with fire and forget task

hosts: demoservers

remote_user: devops

become: true

tasks:

- name: Download big file

get_url:

url: http://mirror-pl.kielcetechnologypark.net/centos/7.8.2003/isos/x86_64/CentOS-7-x86_64-Everything-2003.iso

dest: /tmp

async: 3600

poll: 0

- name: Wait for download to finish

async_status: "jid={{ download_sleeper.ansible_job_id }}"

until: job_result.finished

retries: 30

The output of playbook when executed:

[miro@controlnode paralellism]$ ansible-playbook async_status.yml

PLAY [Async status with fire and forget task] *******************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************************
ok: [localhost]

TASK [Download big file] ****************************************************************************************************************************
changed: [localhost]

TASK [Wait for download to finish] ******************************************************************************************************************
FAILED - RETRYING: Wait for download to finish (30 retries left).
FAILED - RETRYING: Wait for download to finish (29 retries left).
FAILED - RETRYING: Wait for download to finish (28 retries left).
FAILED - RETRYING: Wait for download to finish (27 retries left).
fatal: [localhost]: FAILED! => {"ansible_job_id": "217519210774.5438", "attempts": 5, "changed": false, "dest": "/tmp", "finished": 1, "gid": 0, "group": "root", "mode": "01777", "msg": "Request failed: <urlopen error [Errno -2] Ta nazwa lub usługa jest nieznana>", "owner": "root", "secontext": "system_u:object_r:tmp_t:s0", "size": 4096, "state": "directory", "uid": 0, "url": "http://mirror-pl.kielcetechnologypark.net/centos/7.8.2003/isos/x86_64/CentOS-7-x86_64-Everything-2003.iso"}
to retry, use: --limit @/home/miro/ansible/paralellism/async_status.retry

PLAY RECAP ******************************************************************************************************************************************
localhost : ok=2 changed=1 unreachable=0 failed=1

[miro@controlnode paralellism]$ ansible-playbook async_status.yml

PLAY [Async status with fire and forget task] *******************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************************

ok: [localhost]

TASK [Download big file] ****************************************************************************************************************************

changed: [localhost]

TASK [Wait for download to finish] ******************************************************************************************************************

FAILED - RETRYING: Wait for download to finish (30 retries left).

FAILED - RETRYING: Wait for download to finish (29 retries left).

FAILED - RETRYING: Wait for download to finish (28 retries left).

FAILED - RETRYING: Wait for download to finish (27 retries left).

fatal: [localhost]: FAILED! => {"ansible_job_id": "217519210774.5438", "attempts": 5, "changed": false, "dest": "/tmp", "finished": 1, "gid": 0, "group": "root", "mode": "01777", "msg": "Request failed: <urlopen error [Errno -2] Ta nazwa lub usługa jest nieznana>", "owner": "root", "secontext": "system_u:object_r:tmp_t:s0", "size": 4096, "state": "directory", "uid": 0, "url": "http://mirror-pl.kielcetechnologypark.net/centos/7.8.2003/isos/x86_64/CentOS-7-x86_64-Everything-2003.iso"}

to retry, use: --limit @/home/miro/ansible/paralellism/async_status.retry

PLAY RECAP ******************************************************************************************************************************************

localhost : ok=2 changed=1 unreachable=0 failed=1

Example

In this exercise, you will run a playbook which uses a script to performs a long-running process on using an asynchronous task. Instead of waiting for the task to get completed, will check the status using the async_status module.

Create a script file named longfiles.j2 under the ~/paralellism/templates
directory, with the following content.

#!/bin/bash

echo "emptying $2" > $2
for i in {00..30}; do
  echo "run $i, $1"
  echo "run $i for $1" >> $2
  sleep 1
done

#!/bin/bash

echo "emptying $2" > $2

for i in {00..30}; do

echo "run $i, $1"

echo "run $i for $1" >> $2

sleep 1

done

In the playbook file async.yml, define a tasks to:

Copy the longfiles.j2 script from templates directory to the managed host under /usr/local/bin/longfiles. Change the file and group ownership to root, and change permission of the script to 0755.
Define a task and use the async keyword to:

Run the longfiles file copied previously under /usr/local/bin/longfiles with arguments foo, bar, and baz and the their corresponding output files /tmp/foo.file, /tmp/bar.file and /tmp/baz.file respectively. For example: /usr/local/bin/longfiles foo /tmp/foo.file. Use the async keyword to wait the for the task until 110 second and set the value of poll to 0 so that Ansible starts the command then runs in the background. Register a script_sleeper variable to store the completion status of the command started.

Define a task to switch on the debug mode to see the value stored in the variable script_sleeper. Define a task in the playbook that will keep checking the status of the async task:
• Use the async_status module to check the status of async task triggered previously using the variable script_sleeper.result.
• Set the maximum retries to 30.

The completed async.yml playbook should have the following content:

[miro@controlnode paralellism]$ cat async.yml
# async.yml
- name: longfiles async playbook
  hosts: localhost
  remote_user: miro
  become: true
  tasks:
    - name: template longfiles script
      template:
        src: templates/longfiles.j2
        dest: /usr/local/bin/longfiles
        owner: root
        group: root
        mode: 0755
    - name: run longfiles script
      command: "/usr/local/bin/longfiles {{ item }} /tmp/{{ item }}.file"
      async: 110
      poll: 0
      with_items:
        - foo
        - bar
        - baz
      register: script_sleeper
    - name: show script_sleeper value
      debug:
        var: script_sleeper
    - name: check status of longfiles script
      async_status: "jid={{ item.ansible_job_id }}"
      register: job_result
      until: job_result.finished
      retries: 30
      with_items: "{{ script_sleeper.results }}"

[miro@controlnode paralellism]$ cat async.yml

# async.yml

- name: longfiles async playbook

hosts: localhost

remote_user: miro

become: true

tasks:

- name: template longfiles script

template:

src: templates/longfiles.j2

dest: /usr/local/bin/longfiles

owner: root

group: root

mode: 0755

- name: run longfiles script

command: "/usr/local/bin/longfiles {{ item }} /tmp/{{ item }}.file"

async: 110

poll: 0

with_items:

- foo

- bar

- baz

- name: show script_sleeper value

debug:

var: script_sleeper

- name: check status of longfiles script

async_status: "jid={{ item.ansible_job_id }}"

until: job_result.finished

retries: 30

with_items: "{{ script_sleeper.results }}"

Check the syntax of the async.yml playbook. Correct any errors that you find.

[miro@controlnode paralellism]$ ansible-playbook --syntax-check async.yml

playbook: async.yml

[miro@controlnode paralellism]$ ansible-playbook --syntax-check async.yml

playbook: async.yml

Run the playbook async.yml.
Observe the job ids listed as ansible_job_id to each job running, that were run in parallel using the async keyword. The task check status of longfiles script retries to see if the started jobs are complete using the async_status Ansible module.

[miro@controlnode paralellism]$ ansible-playbook async.yml

PLAY [longfiles async playbook] *********************************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************************
ok: [localhost]

TASK [template longfiles script] ********************************************************************************************************************
changed: [localhost]

TASK [run longfiles script] *************************************************************************************************************************
changed: [localhost] => (item=foo)
changed: [localhost] => (item=bar)
changed: [localhost] => (item=baz)

TASK [show script_sleeper value] ********************************************************************************************************************
ok: [localhost] => {
"script_sleeper": {
"changed": true,
"msg": "All items completed",
"results": [
{
"_ansible_ignore_errors": null,
"_ansible_item_result": true,
"_ansible_no_log": false,
"_ansible_parsed": true,
"ansible_job_id": "115147576181.5785",
"changed": true,
"failed": false,
"finished": 0,
"item": "foo",
"results_file": "/root/.ansible_async/115147576181.5785",
"started": 1
},
{
"_ansible_ignore_errors": null,
"_ansible_item_result": true,
"_ansible_no_log": false,
"_ansible_parsed": true,
"ansible_job_id": "817627902564.5810",
"changed": true,
"failed": false,
"finished": 0,
"item": "bar",
"results_file": "/root/.ansible_async/817627902564.5810",
"started": 1
},
{
"_ansible_ignore_errors": null,
"_ansible_item_result": true,
"_ansible_no_log": false,
"_ansible_parsed": true,
"ansible_job_id": "64020630152.5835",
"changed": true,
"failed": false,
"finished": 0,
"item": "baz",
"results_file": "/root/.ansible_async/64020630152.5835",
"started": 1
}
]
}
}

TASK [check status of longfiles script] *************************************************************************************************************
FAILED - RETRYING: check status of longfiles script (30 retries left).
FAILED - RETRYING: check status of longfiles script (29 retries left).
FAILED - RETRYING: check status of longfiles script (28 retries left).
FAILED - RETRYING: check status of longfiles script (27 retries left).
FAILED - RETRYING: check status of longfiles script (26 retries left).
FAILED - RETRYING: check status of longfiles script (25 retries left).
changed: [localhost] => (item={'_ansible_parsed': True, '_ansible_item_result': True, '_ansible_no_log': False, u'ansible_job_id': u'115147576181.5785', 'failed': False, u'started': 1, 'changed': True, 'item': u'foo', u'finished': 0, u'results_file': u'/root/.ansible_async/115147576181.5785', '_ansible_ignore_errors': None})
changed: [localhost] => (item={'_ansible_parsed': True, '_ansible_item_result': True, '_ansible_no_log': False, u'ansible_job_id': u'817627902564.5810', 'failed': False, u'started': 1, 'changed': True, 'item': u'bar', u'finished': 0, u'results_file': u'/root/.ansible_async/817627902564.5810', '_ansible_ignore_errors': None})
changed: [localhost] => (item={'_ansible_parsed': True, '_ansible_item_result': True, '_ansible_no_log': False, u'ansible_job_id': u'64020630152.5835', 'failed': False, u'started': 1, 'changed': True, 'item': u'baz', u'finished': 0, u'results_file': u'/root/.ansible_async/64020630152.5835', '_ansible_ignore_errors': None})

PLAY RECAP ******************************************************************************************************************************************
localhost : ok=5 changed=3 unreachable=0 failed=0

[miro@controlnode paralellism]$ ansible-playbook async.yml

PLAY [longfiles async playbook] *********************************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************************

ok: [localhost]

TASK [template longfiles script] ********************************************************************************************************************

changed: [localhost]

TASK [run longfiles script] *************************************************************************************************************************

changed: [localhost] => (item=foo)

changed: [localhost] => (item=bar)

changed: [localhost] => (item=baz)

TASK [show script_sleeper value] ********************************************************************************************************************

ok: [localhost] => {

"script_sleeper": {

"changed": true,

"msg": "All items completed",

"results": [

{

"_ansible_ignore_errors": null,

"_ansible_item_result": true,

"_ansible_no_log": false,

"_ansible_parsed": true,

"ansible_job_id": "115147576181.5785",

"changed": true,

"failed": false,

"finished": 0,

"item": "foo",

"results_file": "/root/.ansible_async/115147576181.5785",

"started": 1

{

"_ansible_ignore_errors": null,

"_ansible_item_result": true,

"_ansible_no_log": false,

"_ansible_parsed": true,

"ansible_job_id": "817627902564.5810",

"changed": true,

"failed": false,

"finished": 0,

"item": "bar",

"results_file": "/root/.ansible_async/817627902564.5810",

"started": 1

{

"_ansible_ignore_errors": null,

"_ansible_item_result": true,

"_ansible_no_log": false,

"_ansible_parsed": true,

"ansible_job_id": "64020630152.5835",

"changed": true,

"failed": false,

"finished": 0,

"item": "baz",

"results_file": "/root/.ansible_async/64020630152.5835",

"started": 1

}

]

}

TASK [check status of longfiles script] *************************************************************************************************************

FAILED - RETRYING: check status of longfiles script (30 retries left).

FAILED - RETRYING: check status of longfiles script (29 retries left).

FAILED - RETRYING: check status of longfiles script (28 retries left).

FAILED - RETRYING: check status of longfiles script (27 retries left).

FAILED - RETRYING: check status of longfiles script (26 retries left).

FAILED - RETRYING: check status of longfiles script (25 retries left).

changed: [localhost] => (item={'_ansible_parsed': True, '_ansible_item_result': True, '_ansible_no_log': False, u'ansible_job_id': u'115147576181.5785', 'failed': False, u'started': 1, 'changed': True, 'item': u'foo', u'finished': 0, u'results_file': u'/root/.ansible_async/115147576181.5785', '_ansible_ignore_errors': None})

changed: [localhost] => (item={'_ansible_parsed': True, '_ansible_item_result': True, '_ansible_no_log': False, u'ansible_job_id': u'817627902564.5810', 'failed': False, u'started': 1, 'changed': True, 'item': u'bar', u'finished': 0, u'results_file': u'/root/.ansible_async/817627902564.5810', '_ansible_ignore_errors': None})

changed: [localhost] => (item={'_ansible_parsed': True, '_ansible_item_result': True, '_ansible_no_log': False, u'ansible_job_id': u'64020630152.5835', 'failed': False, u'started': 1, 'changed': True, 'item': u'baz', u'finished': 0, u'results_file': u'/root/.ansible_async/64020630152.5835', '_ansible_ignore_errors': None})

PLAY RECAP ******************************************************************************************************************************************

localhost : ok=5 changed=3 unreachable=0 failed=0

Example 2.

In this lab, you will deploy an upgraded web page to add a new feature using the serial keyword for rolling updates, on two web servers serverb.lab.example.com and tower.lab.example.com running behind a load balancer. The HAProxy load balancer and the web servers are preconfigured on workstation.lab.example.com, serverb.lab.example.com, and tower.lab.example.com respectively. After the upgrade of the web page, the web servers needs to be rebooted one at a time before adding them back to load balancer pool, without affecting the site availability using delegation.

Create a playbook named upgrade_webserver.yml. Use the hosts that are part of the webservers inventory group and use privilege escalation using remote user devops. As the updates needs to pushed to one
server at a time, use the serial keyword with its value as 1.

Create a task to remove the web server from the load balancer pool. Use the haproxy Ansible module to remove from the haproxy load balancer. The task needs to delegated to server from the [lbserver] inventory group. The haproxy module is used to disable a back end server from HAProxy using socket commands. To disable a back end server from the back end pool named app, socket path as /var/lib/haproxy/stats, and wait=yes to wait until the server reports a status of maintenance, use the following:

haproxy: state=disabled backend=app host={{ inventory_hostname }} socket=/var/lib/haproxy/stats  wait=yes

1	haproxy: state=disabled backend=app host={{ inventory_hostname }} socket=/var/lib/haproxy/stats wait=yes

Create a task to copy the updated page template from the directory templates/index-ver1.html.j2 to the web server’s document root as /var/www/html/index.html. Also register a variable, pageupgrade, which will be used later to invoke other tasks.

Create a task to restart the web servers using the command module. Use an asynchronous task that will not wait more than 1 second for it to complete. The task should not be polled for completion. Set poll to 0 to disable polling. Use ignore_errors as true and execute the task if the previously registered pageupgrade variable has changed.

Create a task to wait for the web server to be rebooted. Use the wait_for module to wait for the web server to be rebooted and specify the host as inventory_hostname, port as 22, state as started, delay as 25, and timeout as 200. Use delegation to delegate the task to the local machine. The task should be executed when the variable pageupgrade has changed. Privilege escalation is not required for this task.

Create a task to wait for the web server port to be started. As in the preconfigured web server, the httpd service is configured to start at boot time. Specify the host as inventory_hostname, port as 80, state as started, and timeout as 20.

Create a final task to add the web server after the upgrade of the web page back to the load balancer pool. Use the haproxy Ansible module to add the server back to the HAProxy load balancer pool. The task needs to be delegated to all members of the [lbserver] inventory group. The haproxy module is used to enable a back end server from HAProxy using socket commands. To enable a back end server from the back end pool named app, socket path as /var/lib/haproxy/stats, and wait=yes to wait until the server reports healthy status, use the following:

haproxy: state=enabled backend=app host={{inventory_hostname }} 
socket=/var/lib/haproxy/stats wait=yes

1 2	haproxy: state=enabled backend=app host={{inventory_hostname }} socket=/var/lib/haproxy/stats wait=yes

The following should be the contents of upgrade-webserver.yml:

---
- name: Upgrade Webservers
  hosts: webservers
  remote_user: devops
  become: yes
  serial: 1
  tasks:
    - name: disable the server in haproxy
      haproxy:
        state: disabled
        backend: app
        host: "{{ inventory_hostname }}"
        socket: /var/lib/haproxy/stats
        wait: yes
        delegate_to: "{{ item }}"
        with_items: "{{ groups.lbserver }}"

    - name: upgrade the page
      template:
        src: "templates/index-ver1.html.j2"
        dest: "/var/www/html/index.html"
        register: pageupgrade

    - name: restart machine
      command: shutdown -r now "Ansible updates triggered"
      async: 1
      poll: 0
      ignore_errors: true
      when: pageupgrade.changed

   - name: wait for webserver to restart
     wait_for:
       host: "{{ inventory_hostname }}"
       port: 22
       state: started
       delay: 25
       timeout: 200
     become: False
     delegate_to: 127.0.0.1
     when: pageupgrade.changed

   - name: wait for webserver to come up
     wait_for:
       host: "{{ inventory_hostname }}"
       port: 80
       state: started
       timeout: 20

   - name: enable the server in haproxy
     haproxy:
       state: enabled
       backend: app
       host: "{{ inventory_hostname }}"
       socket: /var/lib/haproxy/stats
       wait: yes
     delegate_to: "{{ item }}"
     with_items: "{{ groups.lbserver }}"

---

- name: Upgrade Webservers

hosts: webservers

remote_user: devops

become: yes

serial: 1

tasks:

- name: disable the server in haproxy

haproxy:

state: disabled

backend: app

host: "{{ inventory_hostname }}"

socket: /var/lib/haproxy/stats

wait: yes

delegate_to: "{{ item }}"

with_items: "{{ groups.lbserver }}"

- name: upgrade the page

template:

src: "templates/index-ver1.html.j2"

dest: "/var/www/html/index.html"

- name: restart machine

command: shutdown -r now "Ansible updates triggered"

async: 1

poll: 0

ignore_errors: true

when: pageupgrade.changed

- name: wait for webserver to restart

wait_for:

host: "{{ inventory_hostname }}"

port: 22

state: started

delay: 25

timeout: 200

become: False

delegate_to: 127.0.0.1

when: pageupgrade.changed

- name: wait for webserver to come up

wait_for:

host: "{{ inventory_hostname }}"

port: 80

state: started

timeout: 20

- name: enable the server in haproxy

haproxy:

state: enabled

backend: app

host: "{{ inventory_hostname }}"

socket: /var/lib/haproxy/stats

wait: yes

delegate_to: "{{ item }}"

with_items: "{{ groups.lbserver }}"