Skip to main content

CUDA Runtime Error and Restore

Yujie LiuLess than 1 minuteComputer ScienceCUDAGPGPU

CUDA Runtime Error and Restore

There are two types of errors in CUDA Runtime: sticky and non-sticky ones.

CUDA Error: Sticky V.S. Non-sticky

StickyNon-Sticky
DescriptionThe behavior is undefined in the event of a CUDA error which corrupts the CUDA context.
This type of error is evident because it is "sticky", meaning once it occurs, every single CUDA API call will return that error, until the context is destroyed.
Non-sticky errors are cleared automatically after they are returned by a cuda API call
ExamplesAny "crashed kernel" type error (invalid access, unspecified launch failure, etc.)An example of a non-sticky error might be an attempt to cudaMalloc more data than is available in device memory. Such an operation will return an out-of-memory error.
How to recoverThe only method to restore proper device functionality after a non-recoverable ("sticky") CUDA error is to terminate the host process that initiated.The error will be cleared after being returned, and subsequent (valid) cuda API calls can complete successfully, without returning an error.

References: hereopen in new window and hereopen in new window.

Last update:
Contributors: Yujie