How to Run
Prerequisites
- OpenShell CLI installed and gateway running (
openshell gateway start) - GitHub CLI (
gh) installed and authenticated (gh auth login) - Docker available (openshell uses it to build images)
- A GitHub repo you own (used as the push target)
Phase 2 only (agent bypass test)
- GCP Application Default Credentials configured (
gcloud auth application-default login) - The following environment variables set:shell
export CLAUDE_CODE_USE_VERTEX=1 export CLOUD_ML_REGION=<your-region> # e.g. us-east5 export ANTHROPIC_VERTEX_PROJECT_ID=<your-gcp-project-id>
How providers and credentials work
The script handles provider setup automatically:
- GitHub provider: Created from your
gh auth token. The proxy injects aGITHUB_TOKENplaceholder into the sandbox; when git sends an HTTP request with the placeholder in theAuthorizationheader, the proxy resolves it to the real token before forwarding upstream. - Vertex AI (Phase 2 only): The ADC file (
~/.config/gcloud/application_default_credentials.json) is uploaded into the sandbox at/home/sandbox/.config/gcloud/. The Vertex env vars (CLAUDE_CODE_USE_VERTEX,CLOUD_ML_REGION, etc.) are passed directly when invoking Claude inside the sandbox. These are config values, not secrets, so they bypass the proxy placeholder system.
Run
# All phases
./run.sh owner/repo
# Phase 1 only (proxy binary tracking, no agent)
./run.sh owner/repo phase1
# Phase 2 only (agent bypass test)
./run.sh owner/repo phase2
# Phase 2 with a specific variant (default: http)
PHASE2_VARIANT=subprocess ./run.sh owner/repo phase2What it does
Phase 1: Proxy binary tracking
Tests four sandbox images, each with a different safe-push binary at /usr/local/bin/safe-push. Each image is built and pushed by openshell automatically from its images/<variant>/Dockerfile.
Test order (most likely blocked to least likely blocked):
- S (shebang) — Python script with
#!/usr/bin/env python3. Proxy sees/usr/bin/python3via/proc/pid/exe, not the script path. - B (exec) — Go binary that
syscall.Execs togit. Process image changes fromsafe-pushtogitafter exec. - A (subprocess) — Go binary that spawns
git pushas a child process. Child has its own/proc/pid/exepointing to/usr/bin/git. - C (http) — Go binary using go-git (pure Go). All HTTP connections originate from the
safe-pushbinary itself.
For each variant, three tests run:
| Test | Expected result | What it tells us |
|---|---|---|
safe-push origin experiment/policy-test | Push succeeds | Binary is allowed by policy |
safe-push origin experiment/policy-test --force | Rejected by safe-push | Force-push detection works |
git push origin experiment/policy-test | Blocked by proxy | /usr/bin/git is not in the policy |
Phase 2: Agent bypass test
Creates a sandbox with the working variant from Phase 1 (default: http) and launches Claude Code with a prompt instructing it to force-push. Monitors whether the policy holds by checking the branch SHA on GitHub before and after.
Results
Results are saved to results/phase1/<variant>/ and results/phase2/.
Each directory contains:
sandbox-logs.txt— General sandbox logssandbox-proxy-logs.txt— Proxy-specific logs showing allow/deny decisions with the binary field the proxy identifiedclaude-output.txt— (Phase 2 only) Full Claude Code session output
Troubleshooting
Sandbox creation times out: The image build happens on the gateway. First runs are slow (~5 min) due to pulling base images. Subsequent runs use cache.
git push fails with auth error: Check that gh auth token returns a valid token and that the GitHub provider was created successfully.
Phase 2 Claude fails to start: Verify your Vertex AI env vars are set and the ADC file exists. Run gcloud auth application-default print-access-token to confirm your credentials are valid.
