Add Thor-specific GPU support with 3D scaling and Railgate control#738
Conversation
- Added jtop/core/thor_power.py and jtop/core/thor_gpu.py to expose
sysfs interfaces for Jetson Thor GPU devfreq and runtime power control.
Includes helper functions for governor and rail-gating state toggling.
- Added jtop/gui/pgpu_thor.py providing a dedicated GUI page with clickable
{3D scaling} and {Railgate} fields for Thor systems.
- Updated jtop/gui/__init__.py to auto-detect Thor hardware using
devfreq_nodes() and import pgpu_thor dynamically when available.
- Ensures 3D scaling and Railgate states are reflected live and toggled
through sysfs (/sys/class/devfreq/gpu-gpc-0/governor and
/sys/bus/pci/devices/0000:01:00.0/power/control).
Implements full Jetson Thor GPU control integration within jtop.
Reviewer's GuideThis PR adds full Jetson Thor GPU control by introducing low-level sysfs helpers for devfreq governors and rail-gating, wrapping them in a Thor-specific GPU interface and service, dynamically detecting Thor hardware to swap in the new classes, and extending the GUI with a dedicated, interactive page to toggle 3D scaling and Railgate live. Class diagram for new Thor GPU and power control interfacesclassDiagram
class thor_power {
+devfreq_nodes() List[str]
+current_governor() Optional[str]
+set_governor(gov: str) (bool, Optional[str])
+toggle_governor() (bool, Optional[str])
+rail_status() Dict
+set_rail(allow_idle: bool) (bool, Optional[str])
+toggle_rail() (bool, Optional[str])
}
class thor_gpu {
+is_thor() bool
+GPU
+GPUService
}
class GPU {
+set_scaling_3D(name: str, value: bool)
+get_scaling_3D(name: str) bool
+scaling_3D bool
+set_railgate(name: str, value: bool)
+get_railgate(name: str) bool
}
class GPUService {
+set_scaling_3D(name: str, value: bool)
+set_railgate(name: str, value: bool)
+get_status() Dict[str, Dict[str, Any]]
}
thor_gpu --> thor_power : uses
GPU <|-- GPUService : service pattern
GPUService --> thor_power : uses
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- The PR uses two different hardware detection methods (devfreq_nodes() in the GUI and is_thor() in core); consider consolidating into a single shared detection utility to keep front-end and back-end in sync.
- The clickable region and mouse-handling code is duplicated between compact_gpu and GPU.draw; refactoring this into a common helper would reduce duplication and improve maintainability.
- Dynamic imports and toggle_* functions currently catch all Exceptions, which can hide real errors—narrow the exception handling or add logging to aid future debugging.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The PR uses two different hardware detection methods (devfreq_nodes() in the GUI and is_thor() in core); consider consolidating into a single shared detection utility to keep front-end and back-end in sync.
- The clickable region and mouse-handling code is duplicated between compact_gpu and GPU.draw; refactoring this into a common helper would reduce duplication and improve maintainability.
- Dynamic imports and toggle_* functions currently catch all Exceptions, which can hide real errors—narrow the exception handling or add logging to aid future debugging.
## Individual Comments
### Comment 1
<location> `jtop/gui/__init__.py:29-33` </location>
<code_context>
from .core.nvpmodel import NVPModel
from .jtop import jtop
+try:
+ from .core.thor_gpu import is_thor
+ if is_thor():
</code_context>
<issue_to_address>
**issue (bug_risk):** Catching all exceptions may mask real errors during Thor detection.
Catching Exception may suppress unrelated errors. Please catch only specific exceptions relevant to Thor detection, such as ImportError or AttributeError.
</issue_to_address>
### Comment 2
<location> `jtop/__init__.py:27-34` </location>
<code_context>
from .core.nvpmodel import NVPModel
from .jtop import jtop
+try:
+ from .core.thor_gpu import is_thor
+ if is_thor():
+ from .core.thor_gpu import GPU, GPUService
+ else:
+ from .core.gpu import GPU
+except Exception:
+ from .gpu import GPU
+
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Catching all exceptions during GPU import may obscure real import errors.
Catching Exception may hide important errors like missing dependencies or syntax issues. Please catch only the specific exceptions related to Thor detection.
</issue_to_address>
### Comment 3
<location> `jtop/core/thor_gpu.py:150-154` </location>
<code_context>
+ rs = rail_status()
+ return rs.get("control_value") == "auto"
+
+ def _get_first_integrated_gpu(self) -> str:
+ for name in self._data:
+ if self._data[name]["type"] == "integrated":
+ return name
+ return ""
+
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Returning an empty string for missing integrated GPU may lead to silent failures.
Consider raising an exception or returning None instead to make error handling explicit.
Suggested implementation:
```python
def _get_first_integrated_gpu(self) -> Optional[str]:
for name in self._data:
if self._data[name]["type"] == "integrated":
return name
return None
```
If you prefer to raise an exception instead of returning `None`, replace the last line with:
```python
raise JtopException("No integrated GPU found")
```
and update the return type hint to just `str`.
Also, ensure that any code calling `_get_first_integrated_gpu` is updated to handle `None` or the exception appropriately.
</issue_to_address>
### Comment 4
<location> `jtop/core/thor_gpu.py:124-126` </location>
<code_context>
@property
def scaling_3D(self) -> bool:
name = self._get_first_integrated_gpu()
if not name:
raise JtopException("no Integrated GPU available")
return self.get_scaling_3D(name)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
</issue_to_address>
### Comment 5
<location> `jtop/core/thor_gpu.py:131-133` </location>
<code_context>
@scaling_3D.setter
def scaling_3D(self, value: bool):
name = self._get_first_integrated_gpu()
if not name:
raise JtopException("no Integrated GPU available")
self.set_scaling_3D(name, value)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
</issue_to_address>
### Comment 6
<location> `jtop/core/thor_gpu.py:140` </location>
<code_context>
def set_railgate(self, name: str, value: bool):
if name not in self._data:
raise JtopException(f'GPU "{name}" does not exist')
ok, err = set_rail(bool(value))
if not ok:
raise JtopException(err or "Failed to set rail-gating")
</code_context>
<issue_to_address>
**suggestion (code-quality):** Remove unnecessary casts to int, str, float or bool ([`remove-unnecessary-cast`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-cast/))
```suggestion
ok, err = set_rail(value)
```
</issue_to_address>
### Comment 7
<location> `jtop/core/thor_gpu.py:151-154` </location>
<code_context>
def _get_first_integrated_gpu(self) -> str:
for name in self._data:
if self._data[name]["type"] == "integrated":
return name
return ""
</code_context>
<issue_to_address>
**suggestion (code-quality):** Use the built-in function `next` instead of a for-loop ([`use-next`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-next/))
```suggestion
return next(
(
name
for name in self._data
if self._data[name]["type"] == "integrated"
),
"",
)
```
</issue_to_address>
### Comment 8
<location> `jtop/core/thor_gpu.py:190` </location>
<code_context>
def set_railgate(self, name: str, value: bool):
if name not in self._gpu_list:
logger.error(f'GPU "{name}" does not exist')
return False
ok, err = set_rail(bool(value))
if not ok:
logger.error(err or "Failed to set rail-gating")
return False
return True
</code_context>
<issue_to_address>
**suggestion (code-quality):** Remove unnecessary casts to int, str, float or bool ([`remove-unnecessary-cast`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-cast/))
```suggestion
ok, err = set_rail(value)
```
</issue_to_address>
### Comment 9
<location> `jtop/core/thor_power.py:78-80` </location>
<code_context>
def set_rail(allow_idle: bool) -> Tuple[bool, Optional[str]]:
"""allow_idle=True -> 'auto'; False -> 'on'."""
ctrl = _pm_control_path()
if not ctrl:
return False, "GPU runtime PM control node not found"
return _write(ctrl, "auto" if allow_idle else "on")
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
</issue_to_address>
### Comment 10
<location> `jtop/core/thor_power.py:106-107` </location>
<code_context>
def current_governor() -> Optional[str]:
for n in devfreq_nodes():
g = _read(os.path.join(n, "governor"))
if g:
return g
return None
</code_context>
<issue_to_address>
**suggestion (code-quality):** Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
```suggestion
if g := _read(os.path.join(n, "governor")):
```
</issue_to_address>
### Comment 11
<location> `jtop/core/thor_power.py:139-142` </location>
<code_context>
def read_podgov(node: str) -> Dict[str, Optional[str]]:
p = podgov_path(node)
params = ["load_max","load_target","load_margin","k","up_freq_margin","down_freq_margin"]
out = {}
for k in params:
out[k] = _read(os.path.join(p, k)) if p else None
return out
</code_context>
<issue_to_address>
**suggestion (code-quality):** We've found these issues:
- Convert for loop into dictionary comprehension ([`dict-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/dict-comprehension/))
- Inline variable that is immediately returned ([`inline-immediately-returned-variable`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/inline-immediately-returned-variable/))
```suggestion
return {k: _read(os.path.join(p, k)) if p else None for k in params}
```
</issue_to_address>
### Comment 12
<location> `jtop/core/thor_power.py:145-147` </location>
<code_context>
def write_podgov(node: str, name: str, value: str) -> Tuple[bool, Optional[str]]:
p = podgov_path(node)
if not p:
return False, f"nvhost_podgov not present on {node}"
return _write(os.path.join(p, name), value)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
</issue_to_address>
### Comment 13
<location> `jtop/gui/pgpu_thor.py:52` </location>
<code_context>
def compact_gpu(stdscr, pos_y, pos_x, width, jetson, mouse=None):
line_counter = 0
if not jetson.gpu:
data = {
'name': 'GPU',
'color': NColors.green() | curses.A_BOLD,
'online': False,
'coffline': NColors.igreen(),
'message': 'NVIDIA GPU NOT DETECTED/AVAILABLE',
}
basic_gauge(stdscr, pos_y, pos_x, width - 2, data)
return 1
# Draw GPU Gauge (original)
for idx, gpu in enumerate(jetson.gpu.values()):
gpu_gauge(stdscr, pos_y + line_counter, pos_x, width, gpu, idx)
line_counter += 1
# Get current states
try:
val3d = current_governor() or "N/A"
except Exception:
val3d = "N/A"
try:
rs = rail_status()
cv = rs.get("control_value")
valrg = "Enabled" if cv == "auto" else ("Disabled" if cv == "on" else "Unknown")
except Exception:
valrg = "Unknown"
# Define layout
y = pos_y + line_counter
label1_x = pos_x + 1
label1 = "3D scaling: "
field1 = "{" + val3d + "}"
field1_x = label1_x + len(label1)
field1_x_end = field1_x + len(field1) - 1
# Position the second label relative to the middle
label2_x = pos_x + max(width // 2, field1_x_end + 3) # ensure no overlap
label2 = "Railgate: "
field2 = "{" + valrg + "}"
field2_x = label2_x + len(label2)
field2_x_end = field2_x + len(field2) - 1
# Handle clicks
if mouse:
mx, my = mouse
if my == y:
# Check click on 3D scaling
if label1_x <= mx <= field1_x_end:
try:
toggle_governor()
except Exception:
pass
# Check click on Railgate
elif label2_x <= mx <= field2_x_end:
try:
toggle_rail()
except Exception:
pass
# Draw labels
try:
stdscr.addstr(y, label1_x, label1, curses.A_BOLD)
stdscr.addstr(y, field1_x, field1)
# Only draw second label if it fits
if field2_x_end < pos_x + width:
stdscr.addstr(y, label2_x, label2, curses.A_BOLD)
stdscr.addstr(y, field2_x, field2)
except curses.error:
pass
return line_counter + 1 # Add one more line for the labels
</code_context>
<issue_to_address>
**issue (code-quality):** Low code quality found in compact\_gpu - 21% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>
### Comment 14
<location> `jtop/gui/pgpu_thor.py:223` </location>
<code_context>
def draw(self, key, mouse):
# Pass 'mouse' to mouse handler, 'key' to hotkey handler
if self._handle_mouse(mouse) or self._handle_hotkeys(key):
# UI will redraw with new values
pass
height, width, first = self.size_page()
gpu_height = (height * 2 // 3 - 3) // max(1, len(self.jetson.gpu))
self.stdscr.addstr(first + 1, 1, "Temperatures:", curses.A_NORMAL)
for idx, name in enumerate(self.jetson.temperature):
if 'gpu' in name.lower():
sensor = self.jetson.temperature[name]
color_temperature(self.stdscr, first + 1, 15, name, sensor)
# reset regions each frame
self._click_regions = {"scaling": [], "railgate": []}
# Draw GPUs
for idx, (gpu_name, gpu_data) in enumerate(self.jetson.gpu.items()):
chart = self.draw_gpus[gpu_name]['chart']
chart_ram = self.draw_gpus[gpu_name]['ram']
gpu_status = gpu_data['status']
gpu_freq = gpu_data['freq']
size_x = [1, width // 2 - 2]
size_y = [first + 2 + idx * (gpu_height + 1), first + 2 + (idx + 1) * (gpu_height - 3)]
governor = gpu_freq.get('governor', '')
label_chart_gpu = "{percent: >3.0f}% - gov: {governor}".format(percent=gpu_status['load'], governor=governor)
chart.draw(self.stdscr, size_x, size_y, label=label_chart_gpu)
size_x_ram = [1 + width // 2, width - 2]
mem_data = self.jetson.memory['RAM']
total = size_to_string(mem_data['tot'], 'k')
shared = size_to_string(mem_data['shared'], 'k')
if chart_ram is not None:
chart_ram.draw(self.stdscr, size_x_ram, size_y, label="{used}/{total}B".format(used=shared, total=total))
button_position = width // 4
button_idx = 0
# 3D scaling — clickable { … }
y = first + 1 + (idx + 1) * gpu_height - 1
x = 1 + button_idx
try:
gov = (current_governor() or "").strip()
val3d = "Enabled" if gov != "performance" else "Disabled"
except Exception:
val3d = "Unknown"
label = "3D scaling: "
field = "{" + val3d + "}"
try:
self.stdscr.addstr(y, x, label, curses.A_BOLD)
color = NColors.green() if val3d == "Enabled" else curses.A_NORMAL
self.stdscr.addstr(y, x + len(label), field, color)
except curses.error:
pass
self._click_regions["scaling"].append((y, x + len(label), x + len(label) + len(field) - 1))
button_idx += button_position
# Railgate — clickable { … }
y = first + 1 + (idx + 1) * gpu_height - 1
x = 1 + button_idx
try:
rs = rail_status()
cv = rs.get("control_value")
valrg = "Enabled" if cv == "auto" else ("Disabled" if cv == "on" else "Unknown")
except Exception:
valrg = "Unknown"
label = "Railgate: "
field = "{" + valrg + "}"
try:
color = NColors.green() if valrg == "Enabled" else curses.A_NORMAL
self.stdscr.addstr(y, x, label, curses.A_BOLD)
self.stdscr.addstr(y, x + len(label), field, color)
except curses.error:
pass
self._click_regions["railgate"].append((y, x + len(label), x + len(label) + len(field) - 1))
button_idx += button_position
# Power control (informational)
plot_name_info(self.stdscr, first + 1 + (idx + 1) * gpu_height - 1, 1 + button_idx,
"Power ctrl", gpu_data.get('power_control', 'runtime_pm'))
button_idx += button_position
# Optional: GPC lanes frequency gauge remains unchanged if present
frq_size = width - 3
if 'GPC' in gpu_freq:
size_gpc_gauge = (width - 2) // (2 + len(gpu_freq['GPC']))
for gpc_idx, gpc in enumerate(gpu_freq['GPC']):
freq_data = {'name': 'GPC{idx}'.format(idx=gpc_idx), 'cur': gpc, 'unit': 'k', 'online': gpc > 0}
freq_gauge(self.stdscr, first + 1 + (idx + 1) * gpu_height,
width // 2 + gpc_idx * (size_gpc_gauge) + 2, size_gpc_gauge - 1, freq_data)
frq_size = width // 2
gpu_freq['name'] = "Frq"
freq_gauge(self.stdscr, first + 1 + (idx + 1) * gpu_height, 1, frq_size, gpu_freq)
height_table = height - first + 2 + gpu_height
self.process_table.draw(first + 2 + gpu_height, 0, width, height_table, key, mouse)
</code_context>
<issue_to_address>
**suggestion (code-quality):** We've found these issues:
- Remove unnecessary calls to `enumerate` when the index is not used ([`remove-unused-enumerate`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unused-enumerate/))
- Low code quality found in GPU.draw - 10% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
```suggestion
for name in self.jetson.temperature:
```
<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
A–F resolved (exception scope, set_rail cast, _get_first_integrated_gpu rationale, draw_clickable colors, meter_y positioning).
|
@sourcery-ai[bot] review items have been addressed. Addressed Sourcery feedback: A. thor_power.py — confirmed function definitions; no external imports required. Also fixed live frequency meter row placement (now above 3D-Scaling/Railgate line). |
|
@whitesscott fix style |
Implements full Jetson Thor GPU control integration within jtop.
Summary by Sourcery
Add Jetson Thor GPU support in jtop by introducing sysfs-based power and devfreq helpers, a Thor-specific GPU backend and service, a dedicated GUI page with live 3D scaling and rail-gating controls, and auto-detection to load the appropriate implementation.
New Features:
Enhancements: