Invisible Malicious Unicode Risks
This wiki page explains the security risk of invisible characters in Unicode that can be copied and pasted into terminal emulators or introduced as vulnerabilities/backdoors in source code contributions, along with documentation that can help to check files and folders for malicious Unicode.
Unicode as a Security Risk
There are invisible characters that might be copied that can do malicious actions. This is a security risk for:
- A) For users: Commands copied and pasted into a terminal emulator.
- B) For developers: Introduction of invisible vulnerabilities or backdoors through source code contributions.
These adversarial encodings produce no visual artifacts probably in most editors and terminals.
Original attack research: https://trojansource.codes/
Checking Files for Unicode
NOTE: Not all unicode in files is necessarily malicious. Only some unicode characters in some files is suspicious or potentially malicious.
grep-find-unicode-wrapper  can help to check files for unicode.
Syntax for files:
Example for files:
Note: The following example check file
~/.bashrc with the actual file to check.
Syntax for folders:
grep-find-unicode-wrapper -r /path/to/folder
Example for folders:
Note: The following example check the user's home folder. Replace
~/ with a different folder if another folder should be checked.
grep-find-unicode-wrapper -r ~/
- A) If no unicode has been found: None.
- B) If unicode has been found: All lines that include unicode.
gcc protects from this https://www.phoronix.com/news/GCC-LLVM-Trojan-Source but other compilers and script interpreters don't even have bug reports.