Add `read_panic_message` kipc #2313

hawkw · 2025-12-03T19:53:56Z

Currently, there is no way to programmatically access the panic message of a task which has faulted due to a Rust panic fron within the Hubris userspace. This branch adds a new read_panic_message kipc that copies the contents of a panicked task's panic message buffer into the caller. If the requested task has not panicked, this kipc returns an error indicating this. This is intended by use by supervisor implementations or other tasks which wish to report panic messages from userspace.

I've also added a test case that exercises this functionality.

Fixes #2311

doc/kipc.adoc

Currently, there is no way to programmatically access the panic message of a task which has faulted due to a Rust panic fron within the Hubris userspace. This branch adds a new `read_panic_message` kipc that copies the contents of a panicked task's panic message buffer into the caller. If the requested task has not panicked, this kipc returns an error indicating this. This is intended by use by supervisor implementations or other tasks which wish to report panic messages from userspace. I've also added a test case that exercises this functionality. Fixes #2311

sys/kern/src/kipc.rs

mkeeter · 2025-12-03T20:36:09Z

sys/kern/src/kipc.rs

+    if index >= tasks.len() {
+        return Err(UserError::Unrecoverable(FaultInfo::SyscallUsage(
+            UsageError::TaskOutOfRange,
+        )));
+    }


Similarly, something like

Suggested change

if index >= tasks.len() {

return Err(UserError::Unrecoverable(FaultInfo::SyscallUsage(

UsageError::TaskOutOfRange,

)));

}

let Some(task) = tasks.get(index) else {

return Err(UserError::Unrecoverable(FaultInfo::SyscallUsage(

UsageError::TaskOutOfRange,

)));

};

(then use task below instead of indexing repeatedly)

Co-authored-by: Matt Keeter <matt@oxide.computer>

sys/userlib/src/kipc.rs

sys/kern/src/kipc.rs

doc/kipc.adoc

cbiffle · 2025-12-29T22:51:03Z

sys/kern/src/kipc.rs

+    let TaskState::Faulted {
+        fault: FaultInfo::Panic,
+        ..
+    } = task.state()
+    else {


huuuh, I've never seen a let/else used in a way that doesn't bind any names like this. I assume you prefer this over if matches!?

Yeah, it just felt a little bit better than

if !matches!(task.state(), TaskState::Faulted { fault: FaultInfo::Panic, .. })) { return ...; }

for some vague aesthetic reason. But it should be equivalent.

cbiffle · 2025-12-29T22:52:09Z

sys/kern/src/kipc.rs

+        ..
+    } = task.state()
+    else {
+        return Err(UserError::Recoverable(


Do you have any code that uses this API yet? I suspect this error is going to be unwrap()'d, in which case we might want to eliminate it and have the kernel fault the caller if they got the task state wrong. (Which is to say, switch this to Unrecoverable.)

FWIW all the other errors in this implementation look right to me.

Hmm, I was imagining we might see this error in the event that a task responsible for collecting panic messages (i.e., packrat) gets pinged about a panic, but doesn't actually get to run before a timer in jefe elapses and the task gets restarted. In that case, the task would no longer have a panic message to share, and packrat (or whomever wants the panic message) has just missed its chance to read the panic. I would definitely not want to panic packrat in that case, so I felt like this error ought to be handle-able without panicking.

If we don't end up implementing that behavior in the supervisor, and instead make it wait to be informed that a panic message has been collected before restarting the faulted task, it might make more sense to make this case a fault for the caller. But, the approach we discussed in #2309 (comment) made me feel inclined to go down the path of treating the restart cooldown in jefe as a timeout for collecting panic messages, so I was planning to do that (so that packrat or similar can't block other tasks from restarting indefinitely). And, even if we did decide to make jefe wait for panic messages to be recorded before restarting a task, making this a handleable error allows more flexibility for other (theoretical) supervisor implementations to do other things.

cbiffle · 2025-12-29T22:53:01Z

sys/kern/src/kipc.rs

+    // using the `userlib::ipc::read_panic_message()` wrapper, then the caller's
+    // buffer will always be exactly 128 bytes long. However, we can't rely on
+    // that here, as either task *could* be an arbitrary binary that wasn't
+    // compiled with the Hubris userlib, so we need to be safe regardless.


doc/kipc.adoc

sys/userlib/src/kipc.rs

cbiffle · 2025-12-29T22:59:39Z

sys/userlib/src/kipc.rs

+///
+/// Note that Hubris only preserves the first [`PANIC_MESSAGE_MAX_LEN`] bytes of
+/// a task's panic message, and panic messages greater than that length are
+/// truncated. Thus, this function accepts a buffer of that length.


Just wanted to note that we could most likely loosen this later, if required, by changing this to a &mut [u8] and not breaking callers.

Yup, that was my thinking as well.

cbiffle · 2025-12-29T23:00:36Z

sys/userlib/src/kipc.rs

+///
+/// - [`Ok`]`(&[u8])` if the task is panicked. The returned slice is borrowed
+///   from `buf`, and contains the task's panic message as a sequence of
+///   UTF-8 bytes. Note that the slice may be empty, if the task has panicked


Technically, nothing ensures that the contents of that slice are UTF-8, so I wouldn't advertise that. The panic message is normally expected to be UTF-8 but it may be truncated in the middle of a multi-byte sequence, and the task is panicking so strictly speaking it could have stomped all over its RAM, producing garbage.

This is another case where seeing a caller might clarify the API design -- it might make sense to call utf8_chunks on the buffer and return the iterator that results so that the caller can easily step over valid and invalid sections if desired. Or, callers may just byte-copy the result into CBOR and not parse it at all (in which case it's quite important that it not hit CBOR as UTF-8!).

Yeah. So, I was planning to probably just encode the valid regions as a CBOR string in the caller which I haven't written yet. But, I had a vague sense that the KIPC API ought not make too many decisions about what the caller wants; I had kind of been considering having a higher-level read_panic_message_utf8 or something that just gives you the valid UTF-8 regions. The sense I got from the other KIPC APIs is that they don't tend to do much beyond what's necessary to actually send and receive the IPC from the kernel, so I wanted to be consistent with that.

On the other hand, I suspect basically all callers are going to want to get a string out and not want anything that isn't UTF-8, so maybe this API should just do the higher-level behavior. I think you're starting to sell me on that.

test/test-suite/src/main.rs

cbiffle · 2025-12-29T23:07:02Z

test/test-suite/src/main.rs

+            .unwrap();
+    // it should look kinda like a panic message (but since the line number may
+    // change, don't make assertions about the entire contents of the string...
+    assert!(core::str::from_utf8(msg)


I'm not sure the expect here is right -- assuming the assistant panics with an ASCII-only message (which it likely does) you could still get this test to fail by running it in a deep path containing non-ASCII characters.

Since in the end you're only checking the prefix of the string, you can get the valid UTF-8 prefix using

msg.utf8_chunks().next().unwrap().valid()

(the unwrap() there is an unfortunate detail of the utf8_chunks implementation, which is defined as returning an iterator yielding at least one chunk... but the type system can't see that.)

Ah, that's a good point, thanks!

Co-authored-by: Cliff L. Biffle <cliff@oxide.computer>

cbiffle · 2025-12-29T23:25:05Z

sys/kern/src/kipc.rs

+    };
+
+    let Ok(message) = task.save().as_panic_args().message else {
+        return Err(UserError::Recoverable(


(In case anyone's curious, I was musing on how this could fail in practice. Because we're getting a USlice<u8>, it can't be misaligned. In practice, the only "invalid" combination that will trigger this specific check is: a slice that spans the end of the address space, such that base + len would overflow.)

I should probably write that down here, huh?

Co-authored-by: Cliff L. Biffle <cliff@oxide.computer>

hawkw requested a review from cbiffle December 3, 2025 19:53

hawkw commented Dec 3, 2025

View reviewed changes

doc/kipc.adoc Show resolved Hide resolved

hawkw force-pushed the eliza/read-panic-message branch 2 times, most recently from c4ca702 to 2c80a58 Compare December 3, 2025 19:56

hawkw force-pushed the eliza/read-panic-message branch from 2c80a58 to 5ba535a Compare December 3, 2025 20:27

mkeeter reviewed Dec 3, 2025

View reviewed changes

sys/kern/src/kipc.rs Outdated Show resolved Hide resolved

mkeeter reviewed Dec 3, 2025

View reviewed changes

Update kipc.rs

1313935

Co-authored-by: Matt Keeter <matt@oxide.computer>

mkeeter reviewed Dec 3, 2025

View reviewed changes

sys/userlib/src/kipc.rs Outdated Show resolved Hide resolved

labbott reviewed Dec 3, 2025

View reviewed changes

sys/kern/src/kipc.rs Outdated Show resolved Hide resolved

hawkw added 2 commits December 3, 2025 13:59

review feedback + tidiness

07460ec

CLIPPY DAMAGE

68a2fb7

hawkw added kernel Relates to the Hubris kernel userlib Related to userlib, the fundamental library used by tasks fault-management Everything related to the Oxide's Fault Management architecture implementation labels Dec 4, 2025

hawkw self-assigned this Dec 4, 2025

hawkw added 2 commits December 4, 2025 11:35

Merge branch 'master' into eliza/read-panic-message

2ab7a43

Merge branch 'master' into eliza/read-panic-message

8a221c1

cbiffle reviewed Dec 29, 2025

View reviewed changes

doc/kipc.adoc Show resolved Hide resolved

cbiffle reviewed Dec 29, 2025

View reviewed changes

add note on UTF-8 truncation

425c221

Co-authored-by: Cliff L. Biffle <cliff@oxide.computer>

cbiffle reviewed Dec 29, 2025

View reviewed changes

hawkw and others added 4 commits December 29, 2025 17:04

Update kipc.adoc

3748fda

Co-authored-by: Cliff L. Biffle <cliff@oxide.computer>

Update kipc.adoc

09fe93f

Co-authored-by: Cliff L. Biffle <cliff@oxide.computer>

Update main.rs

e468ef7

Co-authored-by: Cliff L. Biffle <cliff@oxide.computer>

More of @cbiffle's docs suggestions

7b4a6bb

Co-authored-by: Cliff L. Biffle <cliff@oxide.computer>

Add read_panic_message kipc #2313

Are you sure you want to change the base?

Add read_panic_message kipc #2313

Conversation

hawkw commented Dec 3, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hawkw Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbiffle Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add `read_panic_message` kipc #2313

Add `read_panic_message` kipc #2313

hawkw Dec 30, 2025 •

edited

Loading

cbiffle Dec 29, 2025 •

edited

Loading