Skip to content

feat: PE cert signatures, headless browser stealer, COM office martian, ransomware_message fix#571

Open
wmetcalf wants to merge 3 commits into
CAPESandbox:masterfrom
wmetcalf:feat/new-detection-signatures
Open

feat: PE cert signatures, headless browser stealer, COM office martian, ransomware_message fix#571
wmetcalf wants to merge 3 commits into
CAPESandbox:masterfrom
wmetcalf:feat/new-detection-signatures

Conversation

@wmetcalf
Copy link
Copy Markdown
Contributor

New Signatures

pe_cert_suspicious.py (all)

Three signatures for suspicious Authenticode certificates:

pe_cert_self_signed (severity 3) — PE signed with a self-generated certificate. Detects when subject CN == issuer CN, excluding well-known root CAs (DigiCert, Entrust, etc.). Common in malware that generates throwaway signing certs to appear legitimate.

pe_cert_suspicious_issuer (severity 3) — PE signed by an unrecognized CA with red flags: single-cert chain (no intermediate CA), domain-style subject CN (e.g. 112bhv.nl), or validity window < 180 days. Pattern seen in malware using certs from low-trust/compromised issuers.

pe_cert_invalid_signature (severity 4) — Signature failed cryptographic verification. Distinguishes definitive failures (hash mismatch 0x80096010, chain can't be built 0x800B010A, revoked 0x800B0109) from sandbox trust-store gaps ("not trusted by trust provider") which are normal in analysis VMs.

Requires CAPEv2 parse_pe.py fix for cryptography ≥ 40.x (companion PR kevoreilly/CAPEv2#3018).


stealer_headless_browser.py (all)

Detects the credential-extraction phase of browser stealers: browsers launched headless with logging suppressed (--headless --disable-logging --log-level=3) from a suspicious parent directory (Temp, AppData, ProgramData).

Pattern observed: malware from %TEMP% first probes installed browsers with --disable-gpu about:blank, then re-launches them headless+silent to access saved passwords, cookies, and session tokens. Fires when 3+ different browser binaries are launched this way (multi-browser sweep) or when the process tree confirms the suspicious parent.


com_process_activation.py (all)

Detects Office applications (Excel, Word, Outlook, etc.) that COM-activated a suspicious process (mshta, powershell, cmd, wscript, etc.) via the DCOM broker. The LethalHTA technique embeds HTA/ActiveX objects in Office documents; when activated, Windows launches mshta.exe -Embedding via svchost as the COM surrogate — hiding the true parent. This signature only fires when the CAPEv2 process tree enrichment confirms an actual COM subprocess was spawned.

Requires behavior.py COM enrichment (companion PR kevoreilly/CAPEv2#3019).


Bug Fixes

martians_office.py

Add COM-logical children check to the existing Office martians signature. The OS-tree walk was missing LethalHTA spawns because mshta.exe's OS parent is svchost, not Excel. Adds _check_com_martians() that walks the enriched processtree for nodes with com_logical_parent_pid pointing to an Office process — same whitelist as the existing walk.

ransomware_message.py

Fix TypeError: can't use a bytes pattern on a string-like object in re2. Indicators were encoded to bytes and joined with b"|" producing a bytes regex, but buff.lower() returns a str. Changed to compile a plain str pattern matching the str input.

🤖 Generated with Claude Code

**New signatures:**

- `pe_cert_suspicious.py` (all): Three PE Authenticode cert signatures:
  - `pe_cert_self_signed` (sev 3): PE signed with self-signed cert (subject == issuer,
    excluding known root CAs). Uses both digital_signers and guest_signers data.
  - `pe_cert_suspicious_issuer` (sev 3): PE signed by unrecognized CA with incomplete
    chain, domain-style subject CN, or short validity window (< 180 days).
  - `pe_cert_invalid_signature` (sev 4): Signature failed cryptographic verification
    (hash mismatch 0x80096010, revoked 0x800B0109, chain can't be built 0x800B010A).
    Distinguishes definitive failures from sandbox trust-store gaps.

- `stealer_headless_browser.py` (all): Detects browser stealers launching browsers
  headless with logging suppressed (--headless --disable-logging --log-level=3)
  from a suspicious parent directory. Fires when 3+ browsers are launched this way
  (multi-browser sweep = high confidence) or when the process tree confirms the
  suspicious parent. Catches the credential-extraction phase that follows the
  initial browser probe.

- `com_process_activation.py` (all): Detects Office applications (Excel, Word, etc.)
  that COM-activated a suspicious process (mshta, powershell, cmd, etc.) via the
  DCOM broker — the LethalHTA / OLE embedding attack pattern. Only fires when the
  process tree enrichment confirms an actual subprocess was spawned (requires
  CAPEv2 behavior.py network_map COM enrichment).

**Bug fixes:**

- `martians_office.py`: Add COM-logical children check. The existing OS-process-tree
  walk misses LethalHTA spawns because mshta's OS parent is svchost, not Excel.
  Added `_check_com_martians()` that walks the enriched processtree for nodes with
  `com_logical_parent_pid` pointing to an Office process.

- `ransomware_message.py`: Fix `TypeError: can't use a bytes pattern on a string-like
  object` in re2. `indicators` were encoded to bytes and joined with `b"|"` producing
  a bytes regex, but `buff.lower()` returns a str. Changed to compile a str pattern.
Copilot AI review requested due to automatic review settings May 12, 2026 16:21
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several new signatures for detecting malicious behavior, including COM-activated process spawning from Office applications, suspicious or invalid PE certificates, and headless browser launches used for credential theft. It also refactors the RansomwareMessage signature and updates MartiansOffice. Review feedback identifies several critical issues in the ransomware detection logic, specifically potential TypeError exceptions in Python 3 when applying string-based regex to raw bytes, and a crash risk during JSON serialization when reporting raw byte buffers. Additionally, the reviewer noted the accidental removal of the MassRansomNoteDrop class, a performance bottleneck in process tree traversal, and a regression caused by removing a filename fallback in the ransomware signature.

buff = self.get_raw_argument(call, "Buffer")
if buff and len(buff) >= 128:
buff_lower = buff.lower()
matches = set(self.regex.findall(buff_lower))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In Python 3, using a string-compiled regex (self.regex) on a bytes object (buff_lower) will raise a TypeError. Since get_raw_argument returns bytes, you should either decode the buffer or compile the regex as bytes.

Suggested change
matches = set(self.regex.findall(buff_lower))
matches = set(self.regex.findall(buff_lower.decode("utf-8", errors="ignore")))

matches = set(self.regex.findall(buff_lower))
if len(matches) > 1:
self.data.append({"ransom_note": filepath})
self.data.append({"beginning_of_ransom_message": buff})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Appending raw bytes (buff) to self.data will cause a crash during report generation when the results are serialized to JSON. Ensure the data is converted to a string.

Suggested change
self.data.append({"beginning_of_ransom_message": buff})
self.data.append({"beginning_of_ransom_message": buff.decode("utf-8", errors="ignore")})

Comment on lines +186 to +187
if isinstance(filedata, str):
filedata = filedata.encode("utf-8", errors="ignore")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Encoding filedata to bytes here conflicts with the string-based regex used later on line 191, which will cause a TypeError in Python 3. If the regex is string-based, you should ensure filedata is a string.

Suggested change
if isinstance(filedata, str):
filedata = filedata.encode("utf-8", errors="ignore")
if isinstance(filedata, (bytes, bytearray)):
filedata = filedata.decode("utf-8", errors="ignore")

for note_name, dirs in notes.items():
if len(dirs) >= 5:
self.data.append({"ransom_note": note_name, "pid": pid, "directories_count": len(dirs)})
return self.ret
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The MassRansomNoteDrop signature class was entirely removed from this file. If this was unintentional, it represents a significant loss in detection capability for ransomware that drops identical notes across many directories.


# Very short validity (< 180 days)
try:
from datetime import datetime
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Importing datetime inside the run method is inefficient as it will be re-imported every time the signature runs. Move this import to the top of the file.

Comment on lines +74 to +78
for parent in (
self.results.get("behavior", {}).get("processes", []) or []
):
if parent.get("process_id") != parent_id:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This nested loop over the process list results in $O(N^2)$ complexity, which can significantly impact performance for analyses with many processes. Consider pre-indexing the processes by process_id into a dictionary for $O(1)$ lookup.

Comment on lines +108 to +109
"what happened",
"what happened",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The indicator "what happened" is duplicated in the list.

Suggested change
"what happened",
"what happened",
"what happened",


def on_call(self, call, process):
filepath = self.get_argument(call, "HandleName") or self.get_argument(call, "FileName")
filepath = self.get_argument(call, "HandleName")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Removing the fallback to FileName regression: HandleName is not always populated for NtWriteFile calls. This change might reduce detection coverage for ransom notes written via handles that don't have a resolved name in the HandleName field.

Suggested change
filepath = self.get_argument(call, "HandleName")
filepath = self.get_argument(call, "HandleName") or self.get_argument(call, "FileName")

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds several new detection signatures (PE Authenticode certificate anomalies, headless-browser stealer behavior, and Office→COM process activation) and updates existing ransomware/Office signatures to improve detection accuracy.

Changes:

  • Added new signatures: pe_cert_suspicious.py, stealer_headless_browser.py, and com_process_activation.py.
  • Updated martians_office.py to also detect COM-logical children spawned by Office (DCOM broker pattern).
  • Refactored ransomware_message.py to fix the re2 bytes/str regex mismatch and adjusted indicator handling.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
modules/signatures/windows/ransomware_message.py Refactors buffer/regex handling and dropped-file scanning; also removes an additional signature class from the file.
modules/signatures/windows/martians_office.py Adds COM-logical child detection to catch Office-spawned “martians” hidden behind svchost/DCOM.
modules/signatures/all/stealer_headless_browser.py New signature to detect headless+silent browser launches from suspicious parent locations / multi-browser sweeps.
modules/signatures/all/pe_cert_suspicious.py New signatures to flag self-signed, suspicious-issuer, and invalid Authenticode signatures.
modules/signatures/all/com_process_activation.py New signature to detect Office COM-activated subprocesses via enriched process tree metadata.
Comments suppressed due to low confidence (1)

modules/signatures/windows/ransomware_message.py:200

  • This PR removes the MassRansomNoteDrop signature entirely, but the PR description only mentions a regex TypeError fix for ransomware_message.py. If the removal is unintentional, restore/move the signature; if intentional, please document the behavior change (and consider deprecating instead of deleting to avoid breaking downstream expectations).
    def on_complete(self):
        if not self.ret and "dropped" in self.results:
            for dropped in self.results["dropped"]:

                raw_name = dropped.get("name", "")
                if isinstance(raw_name, list) and len(raw_name) > 0:
                    filename = str(raw_name[0]).lower()
                else:
                    filename = str(raw_name).lower()

                if (
                    filename.endswith((".txt", ".html", ".hta", ".rtf"))
                    or "read_me" in filename
                    or "readme" in filename
                    or "read-me" in filename
                ):
                    filedata = dropped.get("data")

                    if isinstance(filedata, str):
                        filedata = filedata.encode("utf-8", errors="ignore")

                    if filedata and len(filedata) >= 128:
                        filedata_lower = filedata.lower()
                        matches = set(self.regex.findall(filedata_lower))

                        if len(matches) > 1:
                            self.data.append({"ransom_note": filename})
                            self.data.append({"beginning_of_ransom_message": filedata})
                            self.ret = True
                            break

        return self.ret


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +155 to +166
buff = self.get_raw_argument(call, "Buffer")
if buff and len(buff) >= 128:
buff_lower = buff.lower()
matches = set(self.regex.findall(buff_lower))

if len(buff_str) >= 32:
buff_lower = buff_str.lower()
matches = set(self.regex.findall(buff_lower))
if len(matches) > 1:
self.data.append({"ransom_note": filepath})
self.data.append({"beginning_of_ransom_message": buff})

if len(matches) > 1:
if self.pid:
self.mark_call()
return True
self.ret = True
"BTC",
"ethereum",
"what happened",
"what happened",
Comment on lines +4 to +12
BROWSER_RE = re.compile(
r'\\(?:chrome|brave|msedge|firefox|opera)\.exe',
re.IGNORECASE
)

SUSPICIOUS_PARENT_RE = re.compile(
r'\\(?:Temp|AppData|ProgramData|Users\\[^\\]+\\(?:AppData|Downloads)|Users\\Public)\\',
re.IGNORECASE
)
Comment on lines +64 to +84
for proc in (
self.results.get("behavior", {}).get("processes", []) or []
):
path = proc.get("module_path", "") or proc.get("process_name", "") or ""
if not BROWSER_RE.search(path):
continue
parent_id = proc.get("parent_id")
if parent_id is None:
continue
# Find parent process
for parent in (
self.results.get("behavior", {}).get("processes", []) or []
):
if parent.get("process_id") != parent_id:
continue
parent_path = parent.get("module_path", "") or ""
if SUSPICIOUS_PARENT_RE.search(parent_path) and not LEGITIMATE_LAUNCHERS.search(parent_path):
suspicious_parent = parent_path
self.data.append({"suspicious_parent": parent_path})
break
if suspicious_parent:
lower = cmd.lower()
if not BROWSER_RE.search(cmd):
continue
if "--headless" not in lower:
from lib.cuckoo.common.abstracts import Signature


def _get_pe(results):
Comment on lines +20 to +38
OFFICE_ACTIVATORS = {
"excel.exe", "winword.exe", "powerpnt.exe", "outlook.exe",
"msaccess.exe", "mspub.exe", "visio.exe",
}

def run(self):
# Only report confirmed COM-spawned subprocesses visible in the enriched tree.
# Requiring com_logical_parent_pid avoids noise from normal JScript/WMI activations.
def walk(nodes):
for node in nodes:
lpid = node.get("com_logical_parent_pid")
lname = (node.get("com_logical_parent_name") or "").lower()
if lpid and os.path.basename(lname) in self.OFFICE_ACTIVATORS:
self.data.append({
"spawned": "%s (pid %s)" % (node.get("name"), node.get("pid")),
"logical_parent": "%s (pid %s)" % (
node.get("com_logical_parent_name"), lpid),
"via": node.get("com_progid") or node.get("com_clsid", ""),
})
@wmetcalf
Copy link
Copy Markdown
Contributor Author

Round 2 fixes pushed (cab6740) — addressing all reviewer feedback:

ransomware_message.py: get_raw_argument returns bytes; now decoded to str before regex matching and before appending to self.data. filedata in on_complete is similarly decoded rather than encoded. Duplicate "what happened" entry removed. FileName fallback restored to on_call. MassRansomNoteDrop restored.

stealer_headless_browser.py: Replaced O(N²) nested loop with proc_by_pid dict for O(1) parent lookup. BROWSER_RE updated to (?<!\w) lookbehind so it matches both bare chrome.exe (process_name field) and full paths. Added Firefox -headless (single-dash) alongside --headless.

pe_cert_suspicious.py: datetime import moved to module level. Re: the target.file.pe vs static.pe suggestion — in this codebase PE data (digital_signers, guest_signers) is normalised into a separate files MongoDB collection and transparently merged back into target.file.pe by a mongo_hook denormalize step on every mongo_find_one call. static.pe is always empty in practice. _get_pe() already tries static.pe first as a fallback; the operative data is at target.file.pe after denormalization. Verified on the live system: pe_cert_self_signed fires correctly.

com_process_activation.py: Noted — a follow-up can tighten the match to a suspicious-children set (mshta, powershell, cmd, wscript, etc.). Left as a known limitation in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants