Files
allegro-checkpoint/skills/github/github-pr-workflow/references/ci-troubleshooting.md
2026-04-01 11:04:00 +00:00

184 lines
4.8 KiB
Markdown

# CI Troubleshooting Quick Reference
Common CI failure patterns and how to diagnose them from the logs.
## Reading CI Logs
```bash
# With gh
gh run view <RUN_ID> --log-failed
# With curl — download and extract
curl -sL -H "Authorization: token $GITHUB_TOKEN" \
https://api.github.com/repos/$GH_OWNER/$GH_REPO/actions/runs/<RUN_ID>/logs \
-o /tmp/ci-logs.zip && unzip -o /tmp/ci-logs.zip -d /tmp/ci-logs
```
## Common Failure Patterns
### Test Failures
**Signatures in logs:**
```
FAILED tests/test_foo.py::test_bar - AssertionError
E assert 42 == 43
ERROR tests/test_foo.py - ModuleNotFoundError
```
**Diagnosis:**
1. Find the test file and line number from the traceback
2. Use `read_file` to read the failing test
3. Check if it's a logic error in the code or a stale test assertion
4. Look for `ModuleNotFoundError` — usually a missing dependency in CI
**Common fixes:**
- Update assertion to match new expected behavior
- Add missing dependency to requirements.txt / pyproject.toml
- Fix flaky test (add retry, mock external service, fix race condition)
---
### Lint / Formatting Failures
**Signatures in logs:**
```
src/auth.py:45:1: E302 expected 2 blank lines, got 1
src/models.py:12:80: E501 line too long (95 > 88 characters)
error: would reformat src/utils.py
```
**Diagnosis:**
1. Read the specific file:line numbers mentioned
2. Check which linter is complaining (flake8, ruff, black, isort, mypy)
**Common fixes:**
- Run the formatter locally: `black .`, `isort .`, `ruff check --fix .`
- Fix the specific style violation by editing the file
- If using `patch`, make sure to match existing indentation style
---
### Type Check Failures (mypy / pyright)
**Signatures in logs:**
```
src/api.py:23: error: Argument 1 to "process" has incompatible type "str"; expected "int"
src/models.py:45: error: Missing return statement
```
**Diagnosis:**
1. Read the file at the mentioned line
2. Check the function signature and what's being passed
**Common fixes:**
- Add type cast or conversion
- Fix the function signature
- Add `# type: ignore` comment as last resort (with explanation)
---
### Build / Compilation Failures
**Signatures in logs:**
```
ModuleNotFoundError: No module named 'some_package'
ERROR: Could not find a version that satisfies the requirement foo==1.2.3
npm ERR! Could not resolve dependency
```
**Diagnosis:**
1. Check requirements.txt / package.json for the missing or incompatible dependency
2. Compare local vs CI Python/Node version
**Common fixes:**
- Add missing dependency to requirements file
- Pin compatible version
- Update lockfile (`pip freeze`, `npm install`)
---
### Permission / Auth Failures
**Signatures in logs:**
```
fatal: could not read Username for 'https://github.com': No such device or address
Error: Resource not accessible by integration
403 Forbidden
```
**Diagnosis:**
1. Check if the workflow needs special permissions (token scopes)
2. Check if secrets are configured (missing `GITHUB_TOKEN` or custom secrets)
**Common fixes:**
- Add `permissions:` block to workflow YAML
- Verify secrets exist: `gh secret list` or check repo settings
- For fork PRs: some secrets aren't available by design
---
### Timeout Failures
**Signatures in logs:**
```
Error: The operation was canceled.
The job running on runner ... has exceeded the maximum execution time
```
**Diagnosis:**
1. Check which step timed out
2. Look for infinite loops, hung processes, or slow network calls
**Common fixes:**
- Add timeout to the specific step: `timeout-minutes: 10`
- Fix the underlying performance issue
- Split into parallel jobs
---
### Docker / Container Failures
**Signatures in logs:**
```
docker: Error response from daemon
failed to solve: ... not found
COPY failed: file not found in build context
```
**Diagnosis:**
1. Check Dockerfile for the failing step
2. Verify the referenced files exist in the repo
**Common fixes:**
- Fix path in COPY/ADD command
- Update base image tag
- Add missing file to `.dockerignore` exclusion or remove from it
---
## Auto-Fix Decision Tree
```
CI Failed
├── Test failure
│ ├── Assertion mismatch → update test or fix logic
│ └── Import/module error → add dependency
├── Lint failure → run formatter, fix style
├── Type error → fix types
├── Build failure
│ ├── Missing dep → add to requirements
│ └── Version conflict → update pins
├── Permission error → update workflow permissions (needs user)
└── Timeout → investigate perf (may need user input)
```
## Re-running After Fix
```bash
git add <fixed_files> && git commit -m "fix: resolve CI failure" && git push
# Then monitor
gh pr checks --watch 2>/dev/null || \
echo "Poll with: curl -s -H 'Authorization: token ...' https://api.github.com/repos/.../commits/$(git rev-parse HEAD)/status"
```