October 27, 2021
Given a Bash command substitution containing pipes within a GitLab CI/CD pipeline (using Runner version 14.4.0-rc1) the order of execution of some of the piped commands seemed unreliable. Consider the following to grab fixed values from a specifically formatted commit message, which is poor to begin with, but demonstrates the point just as well:
variables:
MESSAGE: |
git@example.com:group/project
subscription: sub1
location: loc1
.shared-scripts:
set-vars: &set-vars
- >
SUBSCRIPTION=$(echo "$MESSAGE" | tail -n 2 | head -n 1 | cut -d " " -f 2);
LOCATION=$(echo "$MESSAGE" | tail -n 1 | cut -d " " -f 2);
anchored-job:
before_script:
- *set-vars
script:
- echo "'${SUBSCRIPTION}'" # does not work
- echo "'${LOCATION}'" # does not work
regular-job-tail-first:
before_script:
- >
SUBSCRIPTION=$(echo "$MESSAGE" | tail -n 2 | head -n 1 | cut -d " " -f 2);
LOCATION=$(echo "$MESSAGE" | tail -n 1 | cut -d " " -f 2);
script:
- echo "'${SUBSCRIPTION}'" # does not work
- echo "'${LOCATION}'" # does not work
regular-job-head-first:
before_script:
- >
SUBSCRIPTION=$(echo "$MESSAGE" | head -n 2 | tail -n 1 | cut -d " " -f 2);
LOCATION=$(echo "$MESSAGE" | tail -n 1 | cut -d " " -f 2);
script:
- echo "'${SUBSCRIPTION}'" # works
- echo "'${LOCATION}'" # does not work
regular-job-more-pipes:
before_script:
- PROJECT=$(echo "$MESSAGE" | head -n 1 | cut -d ":" -f 2 | rev | cut -d "/" -f 1 | rev)
script:
- echo "'${PROJECT}'" # works
While running this locally the result is deterministic, but within a CI job looking at the debug output it’s apparent that the order in which the pipes are ran is not what is expected when the first command is tail
:
--- echo 'git@example.com:group/project
subscription: sub1
location: loc1
'
--- head -n 1
--- tail -n 2
--- cut -d ' ' -f 2
++ SUBSCRIPTION=loc1
Key piece is that SUBSCRIPTION
should not be loc1
, but rather sub1
, because it should execute tail
first and then head
, not the other way around as it will end up targeting a totally different row in the given input. The same can be observed for the LOCATION
variable:
--- echo 'git@example.com:group/project
subscription: sub1
location: loc1
'
--- cut -d ' ' -f 2
--- tail -n 1
++ LOCATION=
It seems to me that the parser, or whatever is responsible, is acting differently when the first command is tail
. This was easily remedied with switching to grep
, which made more sense anyway and relying on only one pipe, but it’s left me wary of using multiple piped operations in the future:
.shared-scripts:
set-vars: &set-vars
- >
SUBSCRIPTION=$(grep "subscription" <<< "$MESSAGE" | cut -d " " -f 2);
LOCATION=$(grep "location" <<< "$MESSAGE" | cut -d " " -f 2);
Oddly enough something like this works just fine, so maybe the root cause lies in using those specific commands (i.e. head
and tail
):
job1:
before_script:
- >
EMAIL=$(echo "$CI_COMMIT_AUTHOR" | rev | cut -d " " -f 1 | rev);
export EMAIL=${EMAIL:1:-1};
I’ve written up an issue for GitLab that hopefully will get a pair of more knowledgeable eyes on this matter.
You can read the comment where I explain a bit more, but long story short: it can be crucial to take into account the block chomping indicators when working with multi-line variables as not stripping newlines from the end, while outputting just the same with echo
, has the potential to throw off any piped operations that depend on the number of lines starting from the end of the variable.