Sleeper Cell - a method of embedding invisible programs into source code

November 10, 2021

It would surprise me if I’m the first person to create something like this, let alone think of it. That being said, I’ve figured out a way to secretly embed a python program inside of another without the embeded program being visible in the second program. It works like this:

In unicode there are things known as Zero Width Characters (ZWC’s). These are characters that can be typed out, but don’t actually display anything. There are several of these within unicode. If we take two of these and think of them as a “1” and a “0”, suddenly we have the basic building blocks of a program.

Let’s start with 2 programs, victim.py

# This is a normal comment
print("Hello, World!")

and exploit.py

print("all your base are belong to us")

The goal is to secretly embed exploit.py into victim.py. Due to the fact that the ZWC’s are still considered characters, we can’t just put them anywhere we want to, but comments are fair game (technically we could also put them in strings without affecting the “runnability” of the program, but this could easily run into issues at runtime which would increase the likelyhood that the exploit would be found out).

In order to get the invisible binary that we need, we first need to read the exploit, convert it to hex for easier mutability, convert that to binary, then replace the binary with our ZWC’s of choice. In my case, I went with “\u200B” and “\u200C”. I also chose to add a header and footer to the code (in the form of eight of each ZWC respectively):

from pathlib import Path
import pyperclip

HEX2BINARY = {
    "0": "0000",
    "1": "0001",
    "2": "0010",
    "3": "0011",
    "4": "0100",
    "5": "0101",
    "6": "0110",
    "7": "0111",
    "8": "1000",
    "9": "1001",
    "a": "1010",
    "b": "1011",
    "c": "1100",
    "d": "1101",
    "e": "1110",
    "f": "1111",
}
HEX_STRING = ""
BINARY_STRING = ""
HEADER = "\u200B\u200B\u200B\u200B\u200B\u200B\u200B\u200B"
FOOTER = "\u200C\u200C\u200C\u200C\u200C\u200C\u200C\u200C"
EXPLOIT_FILE = Path("exploit.py")

with open(EXPLOIT_FILE, "r+", encoding="utf8") as file:
    exploit_code = file.read().encode().hex()

for hex in exploit_code:
    BINARY_STRING += HEX2BINARY[hex]

BINARY_STRING = BINARY_STRING.replace("1", "\u200B")
BINARY_STRING = BINARY_STRING.replace("0", "\u200C")

BINARY_STRING = HEADER + BINARY_STRING + FOOTER

pyperclip.copy(BINARY_STRING)
print("The exploit has been copied to your clipboard.")

It’s a bit crude and hamfisted, but hey, it works ¯\_(ツ)_/¯. The converted code is automatically copied to the clipboard to save me the trouble of trying to copy an invisible string. Now I just paste that into the comment and voila:

# This is a normal comment​​​​​​​​‌​​​‌‌‌‌‌​​​‌‌​‌‌​​‌​‌‌​‌​​‌​​​‌‌​​​‌​‌‌‌‌​‌​‌‌‌‌‌​‌‌‌​‌‌​​‌‌‌‌​‌​​‌​​‌‌‌​​‌​​‌‌‌‌​‌‌‌‌‌‌​​​​‌‌​‌​​‌​​​​‌​​​‌​‌​‌​​​‌‌​‌‌‌​‌‌‌‌‌‌​​‌‌‌​‌‌​​‌‌‌‌​‌​​​‌‌​​‌​​‌‌​‌​‌‌​‌‌‌‌‌‌​​‌‌‌‌​‌​​​‌‌​‌‌​​‌‌​‌​‌‌​‌‌‌‌‌‌​​‌‌‌​‌‌​​‌‌​‌​‌​​‌​​‌‌‌​​‌​​​​‌​​‌​​​‌‌​​‌‌​​​‌‌​‌‌‌‌‌‌​​​‌​‌‌‌​​‌​​​​‌‌​‌‌‌‌‌‌​​​‌​‌​‌​​​‌‌​​‌‌​‌‌‌​‌‌‌​‌​‌‌​‌‌‌‌‌‌‌‌
print("Hello, World!")

The exploit is located at the end of the comment (1’s and 0’s for display purposes):

# This is a normal comment1111111101110000011100100110100101101110011101000010100000100010011000010110110001101100001000000111100101101111011101010111001000100000011000100110000101110011011001010010000001100001011100100110010100100000011000100110010101101100011011110110111001100111001000000111010001101111001000000111010101110011001000100010100100000000
print("Hello, World!")

The decoder is almost as simple as the encoder, with a simple loop at the beginning to check for the header and footer, as well as some bit shifting:

from pathlib import Path

def bitstring_to_bytes(encoded_exploit):
    exploit2int = int(encoded_exploit, 2)
    byte_array = bytearray()
    while exploit2int:
        byte_array.append(exploit2int & 0xff)
        exploit2int >>= 8
    return bytes(byte_array[::-1])

ENCODED_EXPLOIT = ""
VICTIM = Path("victim.py")
BITS_TO_SEPARATE=8
HEADER = "\u200B\u200B\u200B\u200B\u200B\u200B\u200B\u200B"
FOOTER = "\u200C\u200C\u200C\u200C\u200C\u200C\u200C\u200C"

with open(VICTIM, "r+", encoding="utf8") as file:
    file = file.read()
    for character in file:
        ENCODED_EXPLOIT += character
        if ENCODED_EXPLOIT.endswith(HEADER):
            ENCODED_EXPLOIT = ENCODED_EXPLOIT[-8:]
        if ENCODED_EXPLOIT.endswith(FOOTER):
            break

# convert zero width spaces to binary
for character in ENCODED_EXPLOIT:
    if character == "\u200B":
        ENCODED_EXPLOIT = ENCODED_EXPLOIT.replace(character, "1")
    elif character == "\u200C":
        ENCODED_EXPLOIT = ENCODED_EXPLOIT.replace(character, "0")

ENCODED_EXPLOIT = ENCODED_EXPLOIT[8:-8]
exec(bitstring_to_bytes(ENCODED_EXPLOIT))

And when we run this code, it outputs our exploit:

all your base are belong to us
Potential for Harm

With the culture of copy-pasting that surrounds software, it’s not hard to imagine how a non-malicious software dev just doing their day job could unintentionally include this vulnerability into their system. While this article focuses on how to include this code in a python comment, it’s use is not strictly limited to just within python comments. In theory this code could be included into any source code, programming language or not. Text files, CSVs, and any other file that accepts arbitrary text input in susceptable.

The good news is that the invisible source code is no threat on it’s own. Other than bloated file sizes, at least in the context of python files, it has no way of running and requires a secondary script to actually activate it (hence the Sleeper Cell terminology). This brings the value of this exploit (if we can even call it that) into question somewhat, seeing as if we were already able to plant the exploit in the first place, why not just run it right away? I’d be interested in the practical applications of such a method since I’m not really involved in the security space.

Mitigation

VSCode has an extension which is able to detect bad characters: https://marketplace.visualstudio.com/items?itemName=wengerk.highlight-bad-chars

There are likely similar tools for other editors to help catch these kinds of characters.

Related

If you’d like to see the source code it is available here: https://github.com/Mockapapella/SleeperCell

If you enjoyed this article, you might like another article I wrote about python variables, unicode, and fonts a while back: https://www.thelisowe.com/why-can-be-a-variable-in-python-but-not/