hotfix(cli): replace raw subprocess tracebacks with styled error messages and fix restart race condition #96

Merged
zach merged 5 commits from hotfix-restart-race-condition into main 2026-05-24 18:23:43 +02:00
Collaborator

Problem

All commands in shell.py use subprocess.run(..., check=True). When qs exits non-zero, CalledProcessError raises with an ugly raw traceback instead of a clean error message.

Additionally, restart sends a kill signal but immediately tries to start a new instance without waiting for the old process to release its lock file, producing: An instance of this configuration is already running.

And start silently succeeds when an instance is already running — qs exits 0 without flagging the conflict.

Changes

Error handling (all commands)

Replace check=True with capture_output=True, check the return code, and raise click.ClickException with qs's error message. Typer renders this as a rich styled error panel matching the rest of the CLI:
╭─ Error ─────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ No running instance to kill. │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

start — instance conflict detection

Add pre-check via qs -c zshell ipc show (exit 0 = running). If an instance is already running, show error immediately instead of silently succeeding.

restart — race condition fix

After killing, poll qs -c zshell kill up to 50× (2.5s, 50ms intervals) until exit code 255 (no instances), ensuring the old process fully terminated and released its lock before launching the new instance.

Files changed

cli/src/zshell/subcommands/shell.py — all 7 commands (kill, start, restart, show, log, lock, call)

## Problem All commands in `shell.py` use `subprocess.run(..., check=True)`. When `qs` exits non-zero, `CalledProcessError` raises with an ugly raw traceback instead of a clean error message. Additionally, `restart` sends a kill signal but immediately tries to start a new instance without waiting for the old process to release its lock file, producing: `An instance of this configuration is already running.` And `start` silently succeeds when an instance is already running — `qs` exits 0 without flagging the conflict. ## Changes ### Error handling (all commands) Replace `check=True` with `capture_output=True`, check the return code, and raise `click.ClickException` with `qs`'s error message. Typer renders this as a rich styled error panel matching the rest of the CLI: ╭─ Error ─────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ No running instance to kill. │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ### `start` — instance conflict detection Add pre-check via `qs -c zshell ipc show` (exit 0 = running). If an instance is already running, show error immediately instead of silently succeeding. ### `restart` — race condition fix After killing, poll `qs -c zshell kill` up to 50× (2.5s, 50ms intervals) until exit code 255 (no instances), ensuring the old process fully terminated and released its lock before launching the new instance. ### Files changed `cli/src/zshell/subcommands/shell.py` — all 7 commands (kill, start, restart, show, log, lock, call)
AramJonghu added 2 commits 2026-05-23 20:51:43 +02:00
minor typer adjustments to use typer in error/exception throws
Lint & Format (JS/TS) / lint-format (pull_request) Successful in 14s
Python / lint-format (pull_request) Successful in 34s
Python / test (pull_request) Failing after 53s
Lint & Format (Rust) / lint-format (pull_request) Successful in 1m48s
b49165e7ea
AramJonghu requested review from zach 2026-05-23 20:51:43 +02:00
AramJonghu added 1 commit 2026-05-23 20:54:40 +02:00
tests did not match changed code logic
Lint & Format (JS/TS) / lint-format (pull_request) Successful in 11s
Python / lint-format (pull_request) Successful in 25s
Python / test (pull_request) Successful in 50s
Lint & Format (Rust) / lint-format (pull_request) Successful in 1m46s
5e9b373405
Author
Collaborator

Since the text formatting does not show it well, here is an image.
image.png

Since the text formatting does not show it well, here is an image. ![image.png](/attachments/28a2cca5-f7f8-45d4-ad1d-7fb42c4cbfea)
108 KiB
AramJonghu changed title from fix(cli): replace raw subprocess tracebacks with styled error messages and fix restart race condition to hotfix(cli): replace raw subprocess tracebacks with styled error messages and fix restart race condition 2026-05-23 21:08:07 +02:00
zach requested changes 2026-05-24 01:53:09 +02:00
Dismissed
zach left a comment
Owner

The IPC call in the start() function is the most pressing thing we should remove in my opinion. There is no need for a check like that when Quickshell has one built-in.

The IPC call in the `start()` function is the most pressing thing we should remove in my opinion. There is no need for a check like that when Quickshell has one built-in.
@@ -14,3 +21,3 @@
@app.command()
def start(no_daemon: bool = False):
subprocess.run(args + ["-n"] + ([] if no_daemon else ["-d"]), check=True)
check = subprocess.run(args + ["ipc"] + ["show"], capture_output=True)
Owner

The start command should not need to run qs -c zshell ipc call, IPC calls are used to trigger states and functions within the QML code. The -n flag already checks if there's a running instance of the shell.

The start command should not need to run `qs -c zshell ipc call`, IPC calls are used to trigger states and functions within the QML code. The `-n` flag already checks if there's a running instance of the shell.
Author
Collaborator

Was not aware of this. Remaining functions use ipc, which is why I did not bat an eye. Resolving now.

Was not aware of this. Remaining functions use ipc, which is why I did not bat an eye. Resolving now.
Author
Collaborator

Change occurred since I wanted an error block such as in the screenshot. But with that info, ipc is indeed not needed in the code.

Change occurred since I wanted an error block such as in the screenshot. But with that info, ipc is indeed not needed in the code.
zach marked this conversation as resolved
@@ -21,2 +34,2 @@
subprocess.run(args + ["kill"], check=False)
subprocess.run(args + ["-n"] + ([] if no_daemon else ["-d"]), check=True)
subprocess.run(args + ["kill"])
for _ in range(50):
Owner

This seems ugly. Personally think we should just set a sleep timer of ~5 seconds, then run code.

This seems ugly. Personally think we should just set a sleep timer of ~5 seconds, then run code.
Author
Collaborator

Is that not an messier implementation? A user would want a restart to be quick (quicker than running the kill and start command separately). 2.5 or higher stock sleep timer would make little sense.

time.monotonic() could be used to hide the ugliness, but same logic/idea.

Is that not an messier implementation? A user would want a restart to be quick (quicker than running the kill and start command separately). 2.5 or higher stock sleep timer would make little sense. time.monotonic() could be used to hide the ugliness, but same logic/idea.
Owner

Running a loop would keep a CPU thread busy until it's able to launch the shell, while a sleep timer wouldn't. I don't really see much benefit in looping upwards of 50 times just to make sure we can run the start command exactly when it's available.

Running a loop would keep a CPU thread busy until it's able to launch the shell, while a sleep timer wouldn't. I don't really see much benefit in looping upwards of 50 times just to make sure we can run the start command exactly when it's available.
Author
Collaborator

Is that not okay for a rare command as restart?

We could change the loop to check every 250ms instead, so it is 10 checks instead of 50.

Is that not okay for a rare command as restart? We could change the loop to check every 250ms instead, so it is 10 checks instead of 50.
Owner

That's a good middle ground I think, agreed.

That's a good middle ground I think, agreed.
Author
Collaborator

Changed it.

Changed it.
AramJonghu marked this conversation as resolved
@@ -23,0 +37,4 @@
if result.returncode == 255:
break
time.sleep(0.05)
result = subprocess.run(args + ["-n"] + ([] if no_daemon else ["-d"]), capture_output=True)
Owner

Here we can just call start(), right? Not sure if Typer allows calling command functions.

Here we can just call `start()`, right? Not sure if `Typer` allows calling command functions.
Author
Collaborator

It was to have separate tests. Simpler in my eyes (at least at time of writing). I can simplify it by having a start_instance() that start uses outright, and is reused by restart. This could satisfy testing and reuse of code, and perhaps future code changes.

It was to have separate tests. Simpler in my eyes (at least at time of writing). I can simplify it by having a `start_instance()` that start uses outright, and is reused by restart. This could satisfy testing and reuse of code, and perhaps future code changes.
zach marked this conversation as resolved
AramJonghu added 1 commit 2026-05-24 03:09:23 +02:00
refactor(cli): clean shell start/restart, drop redundant ipc check
Lint & Format (JS/TS) / lint-format (pull_request) Successful in 11s
Python / lint-format (pull_request) Successful in 24s
Python / test (pull_request) Successful in 48s
Lint & Format (Rust) / lint-format (pull_request) Successful in 1m44s
78fcf33b3a
AramJonghu requested review from zach 2026-05-24 03:10:13 +02:00
Author
Collaborator

You may resolve conversations if satisfied.

You may resolve conversations if satisfied.
AramJonghu added 1 commit 2026-05-24 18:03:36 +02:00
check every 50ms -> 250ms for restart
Lint & Format (JS/TS) / lint-format (pull_request) Successful in 11s
Python / lint-format (pull_request) Successful in 30s
Python / test (pull_request) Successful in 46s
Lint & Format (Rust) / lint-format (pull_request) Successful in 1m44s
c30128cf95
zach approved these changes 2026-05-24 18:23:37 +02:00
zach merged commit 9ca46967d9 into main 2026-05-24 18:23:43 +02:00
Sign in to join this conversation.