/bin/bash - Proper Whitespace Handling - Whitespace Safety - End-of-Options Parameter Security

From Kicksecure
< Dev
Jump to navigation Jump to search
Design Previous page: Dev/Licensing Index page: Design Next page: Dev/certification /bin/bash - Proper Whitespace Handling - Whitespace Safety - End-of-Options Parameter Security

Supporting multiple command line parameters with spaces in wrapper scripts, safe output handling, and use of the end-of-options parameter (--) for better safety.

Summary

[edit]
  • Shell options: Use set -o errexit, set -o nounset, set -o errtrace, and set -o pipefail.
  • Quoted expansions: Quote variable expansions and prefer ${variable} style.
  • Array-based command building: Bash: Build command lines with arrays. POSIX: sh, use set --.
  • End-of-options marker: Use the end-of-options parameter -- after making sure it is supported, after options and before positional parameters.
  • Long options: Prefer long option names over short flags when sensible for example prefer rm --force over rm -f.
  • Safe output: Do not use echo. Use printf with an explicit format string. Preferebly use printf '%s\n' "text here" or use stecho for unicode safe terminal output when stecho is available.
  • Safe line input: Use IFS= read -r for line-oriented input.
  • Checking unset variables: Under nounset, check explicitly whether variables exist.
  • Local variable declaration: Declare local variables first, then assign to them on a separate line.
  • Dynamic scoping awareness: Remember that Bash local uses dynamic scoping, and do not localize BASH_REMATCH.
  • Loop subshell avoidance: Avoid piping into while read loops, because that can create a subshell.
  • Stdin protection: Prevent stdin stealing inside loops by using a separate file descriptor when needed.
  • NUL-delimited input: Use NUL-delimited input where required, for example with find -files0-from.
  • Pipefail caution: Be careful with pipefail when piping into early-exiting consumers such as grep --quiet.
  • Readable project style: Prefer explicit, readable, whitespace-safe code and project helper functions where appropriate.
  • Indent style:
    • No tabs.
    • Newer scripts: Two (2) spaces to indent bash scripts.
    • "Legacy" scripts: 3 spaces to indent bash scripts.
    • Do not change indent style from 3 to 2, unless doing that in a separate clean commit that changes only whitespace.
    • AI agents should generally not apply major indent style changes unless explicitly being told.
  • No Trailing Whitespaces: Use a proper editor and ensure that no trailing whitespace is left at the end of lines.
  • Self-Identification All programs and scripts should identify themselves in their output. Either only in their first and last output lines, or every line should be prefixed. For example: printf '%s\n' "$0: ERROR: ...". This assists bug reporters and developers to pinpoint in which source code repository and source code file an error might have occurred.
  • No editor configuration inside source files: For example vim modlines due to security.

Safe ways to print

[edit]

For this style guide, do not use echo. Use printf with an explicit format string instead.

shellcheck bug reports:

Please note that printf does not have a default format specifier. The first positional parameter is always treated as the format string. When the format is omitted, untrusted data can be interpreted as format directives or backslash escapes. It is always recommended to be explicit about the format being used.

Normally, there is no need to interpret escape sequences from a variable. Therefore, use the printf format specifier %s when the data is not printed to a terminal:

var="$(printf '%s' "${untrusted_text}")"

printf '%s\n' "message here" is the usual replacement for echo "message here".

If you require escapes to be interpreted, interpret them on a per-need basis:

red="$(printf '%b' "\e[31m")" # red=$'\e[31m' # printf -v red '%b' "\e[31m" nocolor="$(printf '%b' "\e[m")" # nocolor=$'\e[m' # printf -v nocolor '%b' "\e[m"

Escapes that are already interpreted can then be printed with %s:

var="$(printf '%s' "${red} ${untrusted_text} ${nocolor}")"

And this is why you should use stecho when printing to the terminal, because it sanitizes unsafe characters (unicode). Simply using printf '%s' is not sufficient when escapes are already interpreted:

stecho "${red} ${untrusted_text} ${nocolor}" printf '%s' "${red} ${untrusted_text} ${nocolor}" | stecho printf '%s' "${red} ${untrusted_text} ${nocolor}" | stecho | less -R

Rule of thumb:

  • echo: Never.
  • printf: Whenever the printed data is not used by a terminal.
    • Format %b: Only for trusted data or fixed literals.
    • Format %s: With any data.
  • stecho: Whenever the printed data is used by a terminal.
    • When not using stecho: When stecho cannot reasonably be considered available, such as during early build steps when building Kicksecure from source code using derivative-maker.

Resources:

Bash Proper Whitespace Handling

[edit]
  • Quote variables.
  • Build parameters using arrays.
  • Enforce nounset.
  • Use end-of-options.
  • Style: use long option names.
#!/bin/bash

## https://yakking.branchable.com/posts/whitespace-safety/

#set -x
set -o errexit
set -o nounset
set -o errtrace
set -o pipefail

lib_dir="/tmp/test/lib/program with space/something spacy"
main_app_dir="/tmp/test/home/user/folder with space/abc"

mkdir --parents -- "${lib_dir}"
mkdir --parents -- "${main_app_dir}"

declare -a cmd_list

cmd_list+=("cp")
cmd_list+=("--recursive")
cmd_list+=("--")
cmd_list+=("${lib_dir}")
cmd_list+=("${main_app_dir}/")

printf '%s\n' "cmd_list has ${#cmd_list[@]} items"

## Execution example.
"${cmd_list[@]}"

## 'for' loop example.
for cmd_item in "${cmd_list[@]}"; do
    printf '%s\n' "cmd_item: '$cmd_item'"
done

## Alternative.
cmd_alt_list=(
    cp               ## program
    --recursive      ## recursive
    --               ## stop option parsing (protects against paths that begin with '-')
    "$lib_dir"       ## source directory
    "$main_app_dir/" ## destination
)

## 'for' loop example.
for cmd_alt_item in "${cmd_alt_list[@]}"; do
    printf '%s\n' "cmd_alt_item: '$cmd_alt_item'"
done

Why nounset

[edit]

Without nounset, an unset variable silently expands to an empty string. That can turn a dangerous path into something unintended.

rm -- "/$UNSET_VAR"

If UNSET_VAR is unset and nounset is disabled, this becomes:

rm -- "/"

On many systems that will fail with an error such as:

rm: cannot remove '/': Is a directory

That specific command happens to fail here, but the pattern is still unsafe. With set -o nounset, the shell aborts earlier before running rm.

Setting UNSET_VAR="" would not solve the general problem either. Variables that may intentionally be empty should be handled explicitly.

local

[edit]

Error swallowing

[edit]

Note:

local testvar=$(false)

Expected: error

Actual: no error

When declaration and assignment are combined on the same line, local itself returns success and masks the failing command substitution.

Better:

local testvar testvar=$(false)

Dynamic scoping

[edit]

local variables in Bash use dynamic scoping. That means nested function calls can still read and modify them unless they declare their own local variable.

Example:

fn_01 () { local myvar myvar='supposedly local' printf '%s\n' "in fn_01, myvar is $myvar" fn_02 printf '%s\n' "in fn_01, myvar is now $myvar" } fn_02 () { printf '%s\n' "in fn_02, myvar is $myvar" myvar='not so local after all' printf '%s\n' "in fn_02, myvar is now $myvar" } fn_01

Output:

in fn_01, myvar is supposedly local
in fn_02, myvar is supposedly local
in fn_02, myvar is now not so local after all
in fn_01, myvar is now not so local after all

To avoid problems from this, declare all function-local variables as local at the head of a function. For example:

fn_01 () { local myvar myvar='local to fn_01' printf '%s\n' "in fn_01, myvar is $myvar" fn_02 printf '%s\n' "in fn_01, myvar is now $myvar" } fn_02 () { local myvar myvar='local to fn_02' printf '%s\n' "in fn_02, myvar is $myvar" } fn_01

Output:

in fn_01, myvar is local to fn_01
in fn_02, myvar is local to fn_02
in fn_01, myvar is now local to fn_01

BASH_REMATCH

[edit]

Do not local -a BASH_REMATCH!

Note specifically: Bash sets BASH_REMATCH in the global scope; declaring it as a local variable will lead to unexpected results.GNU Bash manualarchive.org iconarchive.today icon

POSIX array

[edit]

On a POSIX shell, positional parameters provide the portable array-like container. $@ has different scope per function or main script. You can build it with set --:

Add items to the array:

set -- a b c

Add items to the beginning or end of the array:

set -- b
set -- a "$@" c

Use of End-of-Options Parameter (--)

[edit]

The end-of-options parameter "--" is important because otherwise inputs might be mistaken for command options. This can even become a security issue. Here are examples using the sponge command:

sponge -a testfilename </dev/null

Result: OK. This works because "testfilename" does not look like an option.

sponge -a --testfilename </dev/null

Result: Fail. The command interprets "--testfilename" as options:

sponge: invalid option -- '-'
sponge: invalid option -- 't'
sponge: invalid option -- 'e'
...

sponge -a -- --testfilename </dev/null

Result: OK. The -- signals that "--testfilename" is a filename, not an option.

Conclusion:

  • The -- parameter marks the end of command options.
  • Place -- after all command options and before filenames or other positional parameters, where the command supports it.
  • This technique is applicable to many Unix/Linux commands, not just sponge.
  • It is especially useful when input may begin with -.

nounset - Check if Variable Exists

[edit]
#!/bin/bash

set -o errexit
set -o nounset
set -o errtrace
set -o pipefail

## Enable for testing.
#unset HOME

if [ -z "${HOME+x}" ]; then
    printf '%s\n' "Error: HOME is not set." >&2
    exit 1
fi

printf '%s\n' "$HOME"

Safely Using Find with NUL-Delimited Input

[edit]

Example:

Note: The variable could be different. It could, for example, be --/usr.

folder_name="/usr"

printf '%s\0' "${folder_name}" | find -files0-from - -perm /u=s,g=s -print0

Do not use stecho or stprint here, because find -files0-from requires NUL-delimited input.

NUL ("\0") is required because:

The starting points in file have to be separated by ASCII NUL characters. Two consecutive NUL characters, i.e., a starting point with a Zero-length file name is not allowed and will lead to an error diagnostic followed by a non-Zero exit code later.Debian find man pagearchive.org iconarchive.today icon

A single trailing NUL is normal. Two consecutive NUL bytes would mean an empty file name entry, which is invalid.

Safely Dereferencing Variables

[edit]

printf '%s\n' "${!var_name}" may execute arbitrary code in the string stored in var_name. Example:

print_var_contents() {
  local var_name
  var_name="$1"
  printf '%s\n' "${!var_name}"
}
print_var_contents 'a[$(uname>&2)0]' # prints 'Linux'

Variable names must be validated before dereferencing them. Variable names will consist entirely of letters, numbers, and underscores, and will start with a letter, number, or underscore [1], so a regex can be used for validation:

print_var_contents() {
  local var_name
  var_name="$1"
  if ! [[ "${var_name}" =~ ^[A-Za-z_][A-Za-z0-9_]*$ ]]; then return 1; fi
  printf '%s\n' "${!var_name}"
}
print_var_contents 'a[$(uname>&2)0]' # prints nothing, exits 1

loops

[edit]

subshells created by pipelines

[edit]

Avoid piping data into a loop. This spawns a subshell even without using $() syntax. Bad code example:

str="abc
def
ghi"
line_count=0

printf '%s\n' "${str}" | while IFS= read -r line; do
  ((line_count += 1))
done

printf '%s\n' "${line_count}"

## Expected result: 3
## Actual result: 0

Instead, redirect command output into the loop. Good code example:

str="abc
def
ghi"
line_count=0

while IFS= read -r line; do
  ((line_count += 1))
done < <(printf '%s\n' "${str}")

printf '%s\n' "${line_count}"

## Result: 3

stdin stealing

[edit]

Commands that read from stdin can swallow data that was supposed to be processed by the read component of a while read loop. qrexec-client-vm is one example, and vim is another. Bad code example:

str="abc
def
ghi"

while IFS= read -r line; do
  vim -- "$line"
done < <(printf '%s\n' "${str}")

## Output:
##
## Vim: Warning: Input is not from a terminal
## Vim: Error reading input, exiting...
## Vim: preserving files...
## Vim: Finished.

Work around this by using alternative file descriptors and redirection. Good code example:

str="abc
def
ghi"

while IFS= read -r line 0<&3; do
  vim -- "$line"
done 3< <(printf '%s\n' "${str}")

## Result: Opens "abc", then "def", then "ghi" in Vim.

misc

[edit]
base_name="${file_name##*/}"
file_extension="${base_name##*.}"

coding style

[edit]
  • no workarounds for older Bash versions. Assume the Bash version of Debian trixie.
  • prefer explicit, readable, whitespace-safe code over compact shell tricks
  • use long options rather than short options when sensible, for example use cp --recursive instead of cp -r
  • no trailing whitespaces allowed in source code files
  • all source code files must have a newline at the end
  • no git style symlinks (git symlinks) (text file without newline at the end) because of past git symlink CVEarchive.org iconarchive.today icon
  • avoid unicode whenever possible. See also unicode-show.
  • use:
    • shellcheck
    • avoid rm when safe-rm is appropriate [2]
    • avoid wget and curl in project code, prefer scurl (Secure Downloads)
    • avoid grep for simple string matching in project code, use str_match
    • str_replace
    • append-once
    • overwrite
  • use ${variable} style
  • use shell options
set -o errexit
set -o nounset
set -o errtrace
set -o pipefail
  • do not use:
    • which, use command -v instead. This is because which is an external binary, whereas command is a shell built-in.
  • file name extensions:
    • POSIX sh libraries: .sh
    • Bash libraries: .bsh
    • Executables: no file name extension
    • (executables = scripts that can be run but cannot be sourced, libraries = scripts that can be sourced but may optionally be run as well)

pipefail and early-exiting consumers

[edit]

This combination can be an issue because the consumer may exit early and the producer may then receive SIGPIPE (broken pipe).

#!/bin/bash

set -o errexit
set -o nounset
set -o errtrace
set -o pipefail

for i in {1..10000}; do
  printf '%s\n' "0"
done | grep --quiet -- "0"

This can fail even though grep --quiet finds a match. grep --quiet exits as soon as it has enough input, while the producer may still be writing. With pipefail enabled, the producer's non-zero exit status can then make the whole pipeline fail.

Guideline:

  • Avoid producer | grep --quiet -- pattern when pipefail is enabled.
  • Prefer matching directly against a variable or file when possible.
  • In project code, prefer helper functions such as str_match where they fit the use case.
  • If an early-exit consumer is intentional, handle exit statuses explicitly instead of assuming the pipeline is harmless.

Improved Error Handler

[edit]

Inspired by stringent.sharchive.org iconarchive.today icon

if (( "$BASH_SUBSHELL" >= 1 )); then kill "$$" fi

Usually not needed. When a subshell detects an error due to errexit and errtrace, it returns a non-zero exit status and the parent shell also sees the failure. Preventing the error handler from running twice is only useful in rare cases.

Resources

[edit]

See Also

[edit]

Footnotes

[edit]

Design Previous page: Dev/Licensing Index page: Design Next page: Dev/certification

Notification image

We believe security software like Kicksecure needs to remain Open Source and independent. Would you help sustain and grow the project? Learn more about our 14 year success story and maybe DONATE!