In a Training Loop 🔄

John Smith PRO

John6666

John6666cat

AI & ML interests

None yet

Recent Activity

reacted to ManniX-ITA's post with 👀 about 12 hours ago

v1.1.0 was Claude + Ollama chat. Eight releases later the stack is a grounded research pipeline plus a local-first memory layer; the token crunch is operational now, not a quality wall. 🚀 claude-hooks v1.8.3 — highlights since v1.1.0. 🧠 /consultants v2 — agentic council, matured. 🛠 tool_executor — PLAN→REPORT lane runs read_file / grep / glob over the codebase before the researcher speaks; claims grounded in tool output, not vibes. ✍️─ coder — sandboxed write_file role with per-language model routing (50KB/file, 1MB/lane caps). 🛡️ CitationLinter — three-layer verifier at the researcher boundary; every `path:line` claim checked against an mtime-cached code_graph. Catches fabricated filenames before they launder through critics + synthesizer. 💾 M14 cross-session memory (default on). LangGraph BaseStore wired across four namespaces: research / tool_results / project / user. Per-namespace TTL: research=30d, tool_results=24h, project+user=forever. Hourly Caliber-style distillation reaper summarizes expiring research into the durable project namespace BEFORE deletion — episodic → semantic, like human consolidation. Originals only dropped after a successful summary write. 🔁 sqlite_vec — full pgvector parity (v1.7). Hybrid recall via RRF over vector cosine + BM25 (FTS5). KG surface: kg_create_entities / kg_add_observations / kg_create_relations / kg_search_nodes. Bundled sqlite-vec-mcp launcher went 3→8 tools so Cursor / Codex / OpenWebUI / Claude Desktop share the same .db. Lazy schema migration carries v1.6.x dbs in place, non-destructive. 🧩 llamafile chat + embed (v1.4 + v1.5). HyDE / reflect / consolidate / get-advice / consultants route to a daemon-supervised local llamafile via the `llamafile://<label>` model prefix. Multi-instance LRU, per-label idle reap, sticky CPU fallback. Stack runs offline now. 🐧 Linux / macOS / Windows. PostgreSQL OR SQLite. Local OR cloud LLMs. 🔗 github.com/mann1x/claude-hooks

reacted to Doradus-AI's post with 👍 about 12 hours ago

Tonight we validated a small upstream vLLM fix that brings GLM-5.1-REAP-478B back into our consumer-Blackwell rotation pool. Sleep/wake on 4× RTX PRO 6000 (SM_120) had a CuMemAllocator race that retired GLM in April: cuMemUnmap runs synchronously from the host the moment a pool-backed tensor's refcount hits zero, but kernels can still be in flight against that storage, accumulating CUDA_ERROR_ILLEGAL_ADDRESS, engine eventually unrecoverable. vllm-project/vllm#43020 is a one-line torch.cuda.synchronize() at the top of _python_free_callback. Steady-state inference unaffected (only cumem frees pay the cost). We caught the unpatched bug live during validation: ``` CUDA Error: invalid argument at /build/vllm/csrc/cumem_allocator.cpp:146 ``` That's the exact failure class #43020 fixes. With it bind-mounted in: Q3.6-27B sleep/wake cycle clean (25.8 GiB VRAM released on /sleep level=1, engine alive, post-wake chat coherent), GLM 30-request stress test 30/30 PASS, 0 CUDA errors. Back into rotation. Side win: we're also submitting a generic Triton autotune shmem-budget helper upstream that replaces hand-rolled check_shared_mem() ? [64,128] : [32,64] bucket switches with per-config precision via Triton's existing prune_configs_by={"early_config_prune": ...} hook. Zero change to the H100/H200 fast path. Submitted: vllm-project/vllm#43047 Full writeup with byte math + stress-test logs + the bind-mount overlay pattern: https://doradusresearch.ai/blog/sleep-mode-on-blackwell-part-2/ Hardware: 4× NVIDIA RTX PRO 6000 Blackwell Workstation Edition (SM_120, 95 GiB per GPU, 101 KiB per-block opt-in shmem). Image stack documented in the writeup!

reacted to kanaria007's post with 👀 about 12 hours ago

✅ Article highlight: Honest Benchmarking for Governed Intelligence Platforms (art-60-241, v0.1) TL;DR: This article argues that benchmark results should be published as bounded observations, not inflated into platform claims. A governed benchmark should not quietly turn “we measured this result under these conditions” into “therefore this platform is more governed, safer, or more production-ready.” Honest benchmarking separates reproducibility, comparability, and disclosability—and keeps benchmark outcomes distinct from stronger governance or platform-readiness claims. Read: https://huggingface.co/datasets/kanaria007/agi-structural-intelligence-protocols/blob/main/article/60-supplements/art-60-241-honest-benchmarking-for-governed-intelligence-platforms.md Why it matters: • prevents benchmark scores from being laundered into governance-readiness claims • distinguishes reproducible results from truly comparable rankings • makes public benchmark language respect disclosure floors and evidence class • gives a clean way to publish strong numbers without overclaiming what they mean What’s inside: • the separation between reproducibility, comparability, and disclosability • the rule that a benchmark result is not the same thing as a platform claim • a benchmark disclosure profile that sets the publication floor • a governed benchmark pack that binds runtime, toolchain, policy surface, evidence class, and results • a comparability declaration and benchmark publication report that state what public reading is actually supportable Key idea: Do not say: “we ranked higher, therefore we are better governed.” Say: “this governed benchmark pack produced these results under this disclosed runtime, toolchain, policy, and evidence surface; this comparability declaration defines what we are and are not fairly comparable to; and this publication report states exactly what public reading is supportable without inflating benchmark observations into stronger platform claims.”

View all activity

Organizations

Posts 5

Post

37737

If your Space stops working after restarting mainly for the last 5 days (https://discuss.huggingface.co/t/my-space-suddenly-went-offline-the-cpu-cannot-restart/151121/22), try some of following.
1. Add pydantic==2.10.6 to requirements.txt or upgrade Gradio to the latest version.
2. Upgrade PyTorch to 2.2.0 or later (torch>=2.2.0 for Zero GPU space).
3. Fix Transformers to 4.49.0 or earlier (transformers<=4.49.0for spaces using Transformers or Diffusers).
4. Fix huggingface_hub to the old version (huggingface_hub==0.25.2 for if an error like cached_download is not available occurs or inference does not work properly)
5. Specifying WORKDIR in Dockerfile may cause the application to fail to start with error 137. (Docker Spaces, https://discuss.huggingface.co/t/error-code-137-cache-error/152177)

About pydantic==2.10.6:
https://discuss.huggingface.co/t/error-no-api-found/146226
https://discuss.huggingface.co/t/internal-server-error-bool-not-iterable/149494

Edit:
Zero GPU space has been upgraded from A100 to H200.
This is likely the reason why older versions of PyTorch are no longer supported.
In fact, an error message to that effect was displayed.
zero-gpu-explorers/README#163

Post

38958

I used up my Zero GPU Quota yesterday (about 12 hours ago). At the time, I got a message saying “Retry at 13:45 (approx.)”, but now it's just changed to “Retry at 03:22”.
Anyway, everyone, let's be careful not to use up our Quota...

Related: https://huggingface.co/posts/Keltezaa/754755723533287#67e6ed5e3394f1ed9ca41dbd