feat: add web_fetch built-in tool#85
Conversation
- Add WebFetchConfig with defaults: HTTPS-only, 15s timeout, 20KB max - Add WebFetchContentValidationConfig for injection detection - Add WebFetchFeedSyncConfig for remote feed sources - Wire config into Config struct, loader, persist, and test util - Add htmd dependency for HTML-to-Markdown conversion
- URL validation: scheme check, host denylist/allowlist, IP literal block - SSRF protection: DNS resolution + private IP detection on every hop - Content validation: regex-based prompt injection pattern scanning - HTML processing: HTML-to-Markdown via htmd, plain text passthrough - FeedSync: infrastructure for remote allowlist/denylist feeds - WebFetchTool: full execute pipeline with redirect handling, pagination - Register as 10th built-in tool in ToolRegistry - 99 tests covering all modules
- Add web_fetch section to tools.md with full spec - Add Web Fetch config section to config.md with field reference - Update TOC and section numbering in both files - Add web_fetch.* to hot-reload fields table
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
Walkthroughweb_fetch ツールを追加し、設定モデル、URL/SSRF 検証、HTML→Markdown 処理、コンテンツ検証、ツール本体・登録、テスト、ドキュメント、Cargo 依存を統合した。 Changesweb_fetch ツール実装と設定統合
Sequence Diagram (高レベルの実行フロー): sequenceDiagram
participant Client as ToolCaller
participant Validator as url_validation::validate_url
participant DNS as resolve_dns_and_validate
participant HTTP as reqwest HTTP client
participant HTML as html_processing::process_response_body
participant CV as content_validation::validate_content
participant Tool as WebFetchTool
Client->>Tool: execute(url, timeout, max_bytes)
Tool->>Validator: validate_url(url, config)
Validator->>DNS: if needed resolve_dns_and_validate(host)
DNS-->>Validator: validation result
Tool->>HTTP: fetch (manual redirects loop)
HTTP->>HTML: process_response_body(body, content_type)
HTML->>CV: validate_content(markdown_or_text, config)
CV-->>Tool: OK / ValidationFailure
Tool-->>Client: ToolResult (content + details or error)
Estimated code review effort: Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (2)
src/tools/mod.rs (1)
18-19: ⚡ Quick win
#[allow(dead_code)]属性が付いてるのが気になるねweb_fetchモジュールに
#[allow(dead_code)]が付いてるけど、これって一時的なものかな?実装が完了してないコードや未使用の関数が残ってる可能性があるよ。PRサマリーでは99個のテストが追加されたって言ってるから、実装自体は完了してるはずだよね。この属性が本当に必要かどうか確認して、不要なら外しておこう!
♻️ 提案:dead_code属性の確認
以下のコマンドで未使用コードをチェックしてみよう:
#!/bin/bash # Description: web_fetchモジュール内の未使用コードをチェック # web_fetchモジュールのファイルを検索 fd -e rs -p 'src/tools/web_fetch' --exec echo "File: {}" \; # 未使用の関数や構造体を探す(clippy警告から) echo "Running clippy to check for dead code..." cargo clippy --all-targets --all-features -- -W dead_code 2>&1 | rg 'web_fetch'もし警告が出なければ、この属性は削除できるはずだよ!
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/tools/mod.rs` around lines 18 - 19, `#[allow(dead_code)]` が `mod web_fetch;` に付いていますが未使用のコードを無視している可能性があるので、本当に不要か確認して削除してください:まず `mod web_fetch` を参照するテストや呼び出しがすべて存在するか確認し、`cargo clippy --all-targets --all-features -W dead_code` や提案のスクリプトで `web_fetch` に関する dead_code 警告が出ないことを確認したら `#[allow(dead_code)]` を削除し、ビルド・テスト(追加した99個のテストを含む)を通すこと(もし警告が出る場合は未使用の関数を削除するか適切に公開・使用するよう修正する)。src/tools/web_fetch/mod.rs (1)
296-710: ⚖️ Poor tradeoff新規テスト、AAA の区切りをそろえたい。
一部のケースだけ
Arrange/Act/Assertの区切りがあって、ほかは素のままだから、追加したテスト群の読み味が揃ってない。ここは新規分を AAA で統一したほうが追いやすい。 As per coding guidelines,All test functions must follow the AAA (Arrange-Act-Assert) pattern.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/tools/web_fetch/mod.rs` around lines 296 - 710, Several new tests (e.g., fetches_html_and_returns_markdown, fetches_plain_text_as_is, result_details_metadata, truncation_at_max_bytes, truncation_utf8_safe, start_index_continuation, start_index_beyond_content, blocks_content_with_injection, follows_redirect_and_validates, blocks_redirect_to_private_ip, too_many_redirects, http_error_status, timeout_error, untrusted_content_warning, blocks_disallowed_scheme_in_redirect) mix styles; update each to follow the Arrange / Act / Assert (AAA) pattern by clearly separating setup (Arrange) — creating MockServer, configuring Mock responses, and constructing tool via make_tool/test_web_fetch_config — from execution (Act) — single execute(&tool, json!(...)).await call — and from assertions (Assert) — the assert!()/assert_eq!() checks; add short comments or blank lines marking "Arrange", "Act", "Assert" in each test and consolidate any inline setup so each test has one Act call returning result before assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/config/web_fetch.rs`:
- Around line 165-310: Tests in src/config/web_fetch.rs do not consistently
follow the Arrange-Act-Assert (AAA) pattern; update each test
(config_default_values, config_deserialize_full_yaml,
config_deserialize_missing_optional, config_normalize_empty_schemes,
config_normalize_hosts, config_normalize_zero_max_bytes, feed_source_normalize)
to explicitly separate: Arrange (declare yaml inputs and expected values / call
WebFetchConfig::default()), Act (call yaml_serde::from_str(...) and/or
cfg.normalize()), and Assert (all assert_eq!/assert! checks). Ensure each test
clearly constructs inputs first, performs the single action under test second,
and then runs only assertions last to conform to the coding guideline.
In `@src/tools/web_fetch/content_validation.rs`:
- Around line 104-108: The slice to produce scan_text must avoid splitting UTF-8
code points: instead of slicing directly with
&text[..text.len().min(config.max_scan_bytes)], compute an end index <=
config.max_scan_bytes that is a valid UTF-8 char boundary (use
str::is_char_boundary in a loop or walk with text.char_indices to find the last
boundary ≤ max_scan_bytes), then take &text[..end]; preserve the existing else
branch when max_scan_bytes <= 0 and ensure the edge case end == 0 yields an
empty &str rather than panicking; apply this to the scan_text assignment
referencing config.max_scan_bytes and text.
In `@src/tools/web_fetch/feed_sync.rs`:
- Around line 209-414: Multiple test functions (e.g.,
inline_feed_merges_denylist, inline_feed_merges_allowlist, csv_first_column,
skips_disabled_sources, etc.) do not consistently follow the AAA
(Arrange-Act-Assert) layout; refactor each test so they clearly separate the
three phases: Arrange (setup MockServer, build WebFetchFeedSyncConfig and any
source modifications), Act (call resolve_feed_sync(&config).await and capture
the result), and Assert (perform assertions on
result.denylist/result.allowlist). Use short comments or blank lines to mark
"Arrange", "Act", "Assert" and ensure the call to resolve_feed_sync and its
expect/unwrap is only in the Act section (move any assertions after that call).
- Around line 64-77:
HTTPレスポンスのステータスを検証していないため、404/500などのエラー本文がフィードとして扱われています;修正は
client.get(&source.url).send().await の直後にレスポンスに対して .error_for_status()
を呼び出してエラー応答を早期に落とし、失敗時は既存の FeedSyncError::FetchFailed を使って url
とエラー理由(ステータスやエラーメッセージ)を格納するようにしてください(参照シンボル: client, source.url, body 変数取得箇所,
FeedSyncError::FetchFailed)。
In `@src/tools/web_fetch/html_processing.rs`:
- Around line 84-223: The tests (e.g., html_to_markdown_basic,
html_to_markdown_list, html_to_markdown_link, extract_primary_prefers_main,
content_type_html_routes_to_markdown, etc.) must be refactored to follow the AAA
pattern: for each test, create an "arrange" section that assigns input variables
(html/text/json and expected snippets), an "act" section that invokes the
function under test (convert, extract_primary_html, process_response_body) and
stores the result, and an "assert" section that runs the assertions against that
result; update all tests in the diff to follow this structure so inputs,
actions, and assertions are clearly separated and use descriptive local
variables for inputs/results.
- Around line 38-44: The current search for the closing tag uses
lower.find(&close) over the whole string which can return a match before the
opening tag and break extraction; change it to search from content_start onward
by calling lower[content_start..].find(&close) and if found add content_start to
that index to compute content_end, keep the existing check content_end <
content_start and return Some(html[content_start..content_end].to_owned()) as
before so extraction only considers closing tags after the opening tag.
In `@src/tools/web_fetch/mod.rs`:
- Around line 126-127: The code currently uses
params.timeout_secs.unwrap_or(config.timeout_secs) and
params.max_bytes.unwrap_or(config.max_bytes), which lets callers bypass config
limits and then calls response.bytes().await (allocating the whole body), so
large max_bytes can cause OOM; change the logic to clamp incoming values to
config limits (e.g. compute timeout and max_bytes by taking the min of params.*
and config.* or defaulting to config when None using params.timeout_secs.map(|v|
v.min(config.timeout_secs)).unwrap_or(config.timeout_secs) and likewise for
max_bytes) and replace the response.bytes().await path with a bounded streaming
read that stops after the clamped max_bytes (read the response body in chunks
and stop/close once the limit is reached) so you never allocate more than
config.max_bytes; update the code paths that use timeout and max_bytes
accordingly (references: params.timeout_secs, params.max_bytes,
config.timeout_secs, config.max_bytes, response.bytes().await).
- Around line 218-240: The slice at processed[start..] can panic if start is not
a UTF-8 char boundary and next_start is computed incorrectly; before slicing,
adjust start to the nearest UTF-8 char boundary ≤ start by checking
processed.is_char_boundary(start) and decrementing until it is (safeguard
against underflow), then perform the slice; keep the existing truncation loop
that finds end as the nearest boundary ≤ max_bytes, and compute next_start as
start + end (not start + max_bytes) so the returned continuation offset reflects
the actual byte consumption (use the variables start, processed, max_bytes, end,
next_start to locate changes).
In `@src/tools/web_fetch/url_validation.rs`:
- Around line 239-518: The test functions lack explicit Arrange/Act/Assert
separation; update each test (e.g., allows_https_by_default,
blocks_http_by_default, allows_http_when_configured, blocks_ftp_scheme,
blocks_invalid_url, blocks_url_without_host, blocks_denylist_host,
blocks_denylist_subdomain, allows_denylist_unrelated, enforces_allowlist,
denylist_precedes_allowlist, validation_disabled_allows_all, ssrf_* tests,
redirect_* tests, redirect_too_many, host_normalization_* tests) to follow AAA:
clearly mark Arrange by creating/configuring WebFetchConfig or inputs in one
block, Act by calling the function under test (validate_url, validate_redirect,
resolve_dns_and_validate, is_blocked_ip, is_redirect_limit_exceeded,
normalize_host) and assigning the result to a variable, and Assert by performing
assertions on that result; keep the logic the same but move setup, invocation,
and checks into those three labeled steps for readability and consistency.
- Around line 64-65: The IPv6 check only blocks loopback (IpAddr::V6 match arm
using v6.is_loopback()), leaving ULA (fc00::/7) and link-local (fe80::/10)
addresses allowed and enabling SSRF bypass; update that match arm to also reject
v6.is_unique_local() and v6.is_unicast_link_local() (i.e., treat any
v6.is_loopback() || v6.is_unique_local() || v6.is_unicast_link_local() as
blocked) so ULA and link-local ranges are covered by the same validation.
- Around line 193-195: is_redirect_limit_exceeded currently returns
redirect_count > MAX_REDIRECTS which allows one extra hop; change the comparison
in is_redirect_limit_exceeded to use >= so any redirect_count equal to
MAX_REDIRECTS is treated as exceeding the limit (reference: function
is_redirect_limit_exceeded and constant MAX_REDIRECTS).
---
Nitpick comments:
In `@src/tools/mod.rs`:
- Around line 18-19: `#[allow(dead_code)]` が `mod web_fetch;`
に付いていますが未使用のコードを無視している可能性があるので、本当に不要か確認して削除してください:まず `mod web_fetch`
を参照するテストや呼び出しがすべて存在するか確認し、`cargo clippy --all-targets --all-features -W
dead_code` や提案のスクリプトで `web_fetch` に関する dead_code 警告が出ないことを確認したら
`#[allow(dead_code)]`
を削除し、ビルド・テスト(追加した99個のテストを含む)を通すこと(もし警告が出る場合は未使用の関数を削除するか適切に公開・使用するよう修正する)。
In `@src/tools/web_fetch/mod.rs`:
- Around line 296-710: Several new tests (e.g.,
fetches_html_and_returns_markdown, fetches_plain_text_as_is,
result_details_metadata, truncation_at_max_bytes, truncation_utf8_safe,
start_index_continuation, start_index_beyond_content,
blocks_content_with_injection, follows_redirect_and_validates,
blocks_redirect_to_private_ip, too_many_redirects, http_error_status,
timeout_error, untrusted_content_warning, blocks_disallowed_scheme_in_redirect)
mix styles; update each to follow the Arrange / Act / Assert (AAA) pattern by
clearly separating setup (Arrange) — creating MockServer, configuring Mock
responses, and constructing tool via make_tool/test_web_fetch_config — from
execution (Act) — single execute(&tool, json!(...)).await call — and from
assertions (Assert) — the assert!()/assert_eq!() checks; add short comments or
blank lines marking "Arrange", "Act", "Assert" in each test and consolidate any
inline setup so each test has one Act call returning result before assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dd8167af-530f-4a22-bd91-90ef020d6e85
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (20)
Cargo.tomldocs/config.mddocs/tools.mdsrc/channels/web/auth.rssrc/channels/web/config.rssrc/config/loader.rssrc/config/mod.rssrc/config/persist.rssrc/config/tests.rssrc/config/types.rssrc/config/web_fetch.rssrc/setup/summary.rssrc/test_util.rssrc/tools/mod.rssrc/tools/sanitizer.rssrc/tools/web_fetch/content_validation.rssrc/tools/web_fetch/feed_sync.rssrc/tools/web_fetch/html_processing.rssrc/tools/web_fetch/mod.rssrc/tools/web_fetch/url_validation.rs
- Fix UTF-8 panic in content_validation byte slicing - Add error_for_status() in feed_sync HTTP requests - Fix closing tag search position in html_processing - Clamp params to config limits (prevent OOM via large inputs) - Fix start_index UTF-8 boundary and next_start_index calculation - Block IPv6 ULA/link-local addresses for SSRF protection - Fix redirect limit off-by-one (>= instead of >) - Add AAA pattern separation to all test functions - 12 new tests covering all fix areas (1004 total)
レビュー対応完了対応したレビューコメント: 11件
検証結果: 1004 tests passed, |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/tools/web_fetch/url_validation.rs (1)
68-88: ⚡ Quick windeny/allow の判定がコピペ重複してる🐰
is_host_denylistedとis_host_allowlisted、シグネチャ以外まったく同じ実装。ルール変えるとき片方だけ直してバグる典型パターンなので、ひとつの helper にまとめたい。あと毎回format!(".{entry_lower}")でアロケしてるのも気になるので、strip_suffix+直前バイトが.か、で書き換えると軽くなる。♻️ 修正案: 共通 helper に集約 + アロケ削減
-/// Returns `true` when `host` matches an entry in `denylist`. -/// -/// Both exact matches and subdomain matches are considered: if the denylist -/// contains `"evil.com"` then `"sub.evil.com"` is also blocked. -fn is_host_denylisted(host: &str, denylist: &[String]) -> bool { - let normalized = host.to_ascii_lowercase(); - denylist.iter().any(|entry| { - let entry_lower = entry.to_ascii_lowercase(); - normalized == entry_lower || normalized.ends_with(&format!(".{entry_lower}")) - }) -} - -/// Returns `true` when `host` matches an entry in `allowlist` (exact match -/// or subdomain match, mirroring the denylist logic). -fn is_host_allowlisted(host: &str, allowlist: &[String]) -> bool { - let normalized = host.to_ascii_lowercase(); - allowlist.iter().any(|entry| { - let entry_lower = entry.to_ascii_lowercase(); - normalized == entry_lower || normalized.ends_with(&format!(".{entry_lower}")) - }) -} +/// Returns `true` when `host` matches an entry in `list` exactly or as a subdomain. +fn host_matches_any(host: &str, list: &[String]) -> bool { + let normalized = host.to_ascii_lowercase(); + list.iter().any(|entry| { + let entry_lower = entry.to_ascii_lowercase(); + if normalized == entry_lower { + return true; + } + normalized + .strip_suffix(&entry_lower) + .and_then(|prefix| prefix.strip_suffix('.')) + .is_some() + }) +}呼び出し側は
is_host_denylisted(host, &config.denylist)→host_matches_any(host, &config.denylist)に置き換え。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/tools/web_fetch/url_validation.rs` around lines 68 - 88, Both functions share identical logic and should be merged into a single helper to avoid duplication and the per-entry allocation from format!; introduce fn host_matches_any(host: &str, list: &[String]) -> bool that lowercases the host once (let normalized = host.to_ascii_lowercase()), then for each entry compute let entry_lower = entry.to_ascii_lowercase() and check normalized == entry_lower || normalized.strip_suffix(&entry_lower).map_or(false, |p| p.ends_with('.')); replace is_host_denylisted and is_host_allowlisted bodies with calls to host_matches_any(host, denylist) and host_matches_any(host, allowlist) respectively.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/tools/web_fetch/url_validation.rs`:
- Around line 55-66: The is_blocked_ip function misses IPv4-mapped IPv6 and
unspecified addresses; update is_blocked_ip to detect and unfold IPv4-mapped
IPv6 (e.g. ::ffff:a.b.c.d) into an IPv4 and apply the existing IPv4 checks, and
explicitly treat unspecified addresses (IPv4 0.0.0.0 and IPv6 ::) as blocked;
modify the IpAddr::V6 branch in is_blocked_ip to check for
v6.to_ipv4_mapped()/or inspect the last 32 bits and re-run the IPv4 logic when
mapped, and add explicit checks for IPv4::UNSPECIFIED and Ipv6Addr::UNSPECIFIED
(or v4.is_unspecified()/v6.is_unspecified()) so both 0.0.0.0 and :: are blocked,
then add tests exercising ::ffff:127.0.0.1, ::ffff:192.168.0.1, 0.0.0.0 and ::
to ensure they return true.
---
Nitpick comments:
In `@src/tools/web_fetch/url_validation.rs`:
- Around line 68-88: Both functions share identical logic and should be merged
into a single helper to avoid duplication and the per-entry allocation from
format!; introduce fn host_matches_any(host: &str, list: &[String]) -> bool that
lowercases the host once (let normalized = host.to_ascii_lowercase()), then for
each entry compute let entry_lower = entry.to_ascii_lowercase() and check
normalized == entry_lower || normalized.strip_suffix(&entry_lower).map_or(false,
|p| p.ends_with('.')); replace is_host_denylisted and is_host_allowlisted bodies
with calls to host_matches_any(host, denylist) and host_matches_any(host,
allowlist) respectively.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 87fcfe0e-d744-4264-8e7a-0f65b66c7ce6
📒 Files selected for processing (6)
src/config/web_fetch.rssrc/tools/web_fetch/content_validation.rssrc/tools/web_fetch/feed_sync.rssrc/tools/web_fetch/html_processing.rssrc/tools/web_fetch/mod.rssrc/tools/web_fetch/url_validation.rs
🚧 Files skipped from review as they are similar to previous changes (5)
- src/tools/web_fetch/content_validation.rs
- src/tools/web_fetch/html_processing.rs
- src/tools/web_fetch/mod.rs
- src/config/web_fetch.rs
- src/tools/web_fetch/feed_sync.rs
- Unfold ::ffff:a.b.c.d to IPv4 and re-run IPv4 checks (prevents ::ffff:127.0.0.1 SSRF bypass) - Block 0.0.0.0 and :: (unspecified addresses) - Add 4 new tests for IPv4-mapped, unspecified cases
レビュー対応完了(2巡目)対応したレビューコメント: 2件
検証結果: 1008 tests passed, |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/tools/web_fetch/url_validation.rs (1)
121-128: 💤 Low value
scheme().to_ascii_lowercase()は冗長
urlクレートはパース時点で scheme を ASCII 小文字に正規化済み。to_ascii_lowercase()で再アロケートしなくても、そのままeq_ignore_ascii_caseで比較すれば十分。config.allowed_schemes側にうっかり大文字が混ざってもeq_ignore_ascii_caseが吸収してくれるから挙動も変わらない。ホットパスじゃないから別に放置でもいいけど、エラー時のメッセージで小文字のスキームを返すなら
parsed.scheme()直接で OK。♻️ こんな感じ
- let scheme = parsed.scheme().to_ascii_lowercase(); + let scheme = parsed.scheme(); if !config .allowed_schemes .iter() - .any(|s| s.eq_ignore_ascii_case(&scheme)) + .any(|s| s.eq_ignore_ascii_case(scheme)) { - return Err(UrlValidationError::SchemeNotAllowed(scheme)); + return Err(UrlValidationError::SchemeNotAllowed(scheme.to_string())); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/tools/web_fetch/url_validation.rs` around lines 121 - 128, スキームを再度小文字化して再アロケートしている箇所を削除してください: 現状の let scheme = parsed.scheme().to_ascii_lowercase(); とそれを使った比較を、parsed.scheme() を直接使う比較に置き換え、config.allowed_schemes.iter().any(|s| s.eq_ignore_ascii_case(parsed.scheme())) のようにしてチェックし、許可されない場合のエラー作成(UrlValidationError::SchemeNotAllowed)にも parsed.scheme() を元にした値を渡すようにしてください(再アロケート不要ならそのまま String 化するか既存の型に合わせて渡す)。
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/tools/web_fetch/url_validation.rs`:
- Around line 86-102: The denylist/allowlist check is using raw config entries
so wildcard entries like "*.example.com" and trailing-dot forms are not
normalized; update behavior by normalizing list entries with the existing
normalize_host / normalize_host_list logic before comparison and consolidate
duplicate logic into a single helper (e.g., is_host_match or is_host_in_list)
that both is_host_denylisted and is_host_allowlisted call; this helper should
lowercase and strip/normalize each allow/deny entry (removing leading "*." and
trailing dots as normalize_host_list does) and then perform the same matching
used currently (exact match or subdomain via ends_with with ".{entry}"), and
replace the bodies of is_host_denylisted and is_host_allowlisted to call this
new helper (also add tests asserting denylist: vec!["*.example.com"] rejects
"sub.example.com").
---
Nitpick comments:
In `@src/tools/web_fetch/url_validation.rs`:
- Around line 121-128: スキームを再度小文字化して再アロケートしている箇所を削除してください: 現状の let scheme =
parsed.scheme().to_ascii_lowercase(); とそれを使った比較を、parsed.scheme()
を直接使う比較に置き換え、config.allowed_schemes.iter().any(|s|
s.eq_ignore_ascii_case(parsed.scheme()))
のようにしてチェックし、許可されない場合のエラー作成(UrlValidationError::SchemeNotAllowed)にも
parsed.scheme() を元にした値を渡すようにしてください(再アロケート不要ならそのまま String 化するか既存の型に合わせて渡す)。
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5691a5a8-4cae-4953-ac09-3cb651006498
📒 Files selected for processing (1)
src/tools/web_fetch/url_validation.rs
…ching - Consolidate is_host_denylisted/is_host_allowlisted into host_matches_any - Apply normalize_host to entries so *.example.com matches subdomains - Add 4 tests for wildcard denylist/allowlist behavior
レビュー対応完了(3巡目)対応したレビューコメント: 2件
検証結果: 1011 tests passed, |
Blocker 1 - Memory DoS prevention: - Add Content-Length pre-check: reject responses exceeding max_bytes before reading any body data - Replace response.bytes() with streaming chunk reader that caps at max_bytes, preventing unbounded memory allocation - Skip start_index pagination for stream-capped bodies (prevents data loss); non-truncated responses retain full pagination - Update affected tests for new Content-Length rejection behavior Blocker 2 - Remove dead feed_sync feature: - Delete feed_sync.rs (440 lines of uncalled code) - Remove WebFetchFeedSyncConfig/WebFetchFeedSource from config types, loader, persist, and docs - Zero remaining feed_sync references in codebase 999 tests pass, cargo fmt + clippy clean
There was a problem hiding this comment.
🧹 Nitpick comments (1)
docs/config.md (1)
320-331: 💤 Low valueYAML例でワイルドカードの実例を見せるとより親切かも
現在の
denylist: []とallowlist: []はデフォルト値として正しいんだけど、176行目で説明されてるワイルドカード形式(*.prefix)の具体例が見られないんだよね。例えばこんな感じで実例を追加すると、ユーザーがすぐに真似できるかも:
denylist: - "*.suspicious.com" - "badhost.example" allowlist: - "*.trusted.org"もちろん空配列のままでも問題ないから、完全にオプショナルな提案だよ!
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/config.md` around lines 320 - 331, Add illustrative wildcard examples to the web_fetch YAML example so readers can see how wildcard patterns are used for denylist and allowlist; update the denylist and allowlist entries under the web_fetch block (referencing keys denylist and allowlist within the web_fetch section and noting the wildcard format described near content_validation) to include sample items such as a domain wildcard like "*.suspicious.com" and a literal host like "badhost.example" for denylist and a wildcard like "*.trusted.org" for allowlist, while keeping empty arrays as an acceptable option.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@docs/config.md`:
- Around line 320-331: Add illustrative wildcard examples to the web_fetch YAML
example so readers can see how wildcard patterns are used for denylist and
allowlist; update the denylist and allowlist entries under the web_fetch block
(referencing keys denylist and allowlist within the web_fetch section and noting
the wildcard format described near content_validation) to include sample items
such as a domain wildcard like "*.suspicious.com" and a literal host like
"badhost.example" for denylist and a wildcard like "*.trusted.org" for
allowlist, while keeping empty arrays as an acceptable option.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c3c8c292-6cc1-4b9c-9c6f-5ec7f5246ca6
📒 Files selected for processing (6)
docs/config.mdsrc/config/loader.rssrc/config/persist.rssrc/config/web_fetch.rssrc/tools/web_fetch/mod.rssrc/tools/web_fetch/url_validation.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- src/config/persist.rs
- src/tools/web_fetch/mod.rs
…es a true hard limit Streaming overflow now returns an immediate error consistent with Content-Length pre-check. Removed truncated/next_start_index/total_bytes from details — only final_url and content_type remain.
max_bytes 20000 → 65536, max_scan_bytes 50000 → 65536. Updated defaults, normalize fallbacks, persist skip-serializing predicates, doc comments, tests, and documentation.
Close #35
概要
web_fetchbuilt-in tool を追加。URL からコンテンツを取得し、HTML を Markdown に変換して返す。SSRF 対策・コンテンツバリデーション・FeedSync インフラを含む。実装内容
Config(
WebFetchConfig)allow_private_ipsで解除可)Tool(
WebFetchTool)ToolRegistryに登録start_index/max_bytesで UTF-8 安全なページネーションテスト
cargo fmt/cargo clippyクリーンコミット構成
feat(web_fetch): add WebFetchConfig types and config integration— Config 型・統合feat(web_fetch): implement WebFetchTool with SSRF and content validation— ツール実装docs(web_fetch): add tool and config documentation— ドキュメント更新Summary by CodeRabbit
New Features
Documentation