John Smith's picture
In a Training Loop 🔄

John Smith PRO

John6666

AI & ML interests

None yet

Recent Activity

reacted to ManniX-ITA's post with 👀 about 12 hours ago
v1.1.0 was Claude + Ollama chat. Eight releases later the stack is a grounded research pipeline plus a local-first memory layer; the token crunch is operational now, not a quality wall. 🚀 claude-hooks v1.8.3 — highlights since v1.1.0. 🧠 /consultants v2 — agentic council, matured. 🛠 tool_executor — PLAN→REPORT lane runs read_file / grep / glob over the codebase before the researcher speaks; claims grounded in tool output, not vibes. ✍️─ coder — sandboxed write_file role with per-language model routing (50KB/file, 1MB/lane caps). 🛡️ CitationLinter — three-layer verifier at the researcher boundary; every `path:line` claim checked against an mtime-cached code_graph. Catches fabricated filenames before they launder through critics + synthesizer. 💾 M14 cross-session memory (default on). LangGraph BaseStore wired across four namespaces: research / tool_results / project / user. Per-namespace TTL: research=30d, tool_results=24h, project+user=forever. Hourly Caliber-style distillation reaper summarizes expiring research into the durable project namespace BEFORE deletion — episodic → semantic, like human consolidation. Originals only dropped after a successful summary write. 🔁 sqlite_vec — full pgvector parity (v1.7). Hybrid recall via RRF over vector cosine + BM25 (FTS5). KG surface: kg_create_entities / kg_add_observations / kg_create_relations / kg_search_nodes. Bundled sqlite-vec-mcp launcher went 3→8 tools so Cursor / Codex / OpenWebUI / Claude Desktop share the same .db. Lazy schema migration carries v1.6.x dbs in place, non-destructive. 🧩 llamafile chat + embed (v1.4 + v1.5). HyDE / reflect / consolidate / get-advice / consultants route to a daemon-supervised local llamafile via the `llamafile://<label>` model prefix. Multi-instance LRU, per-label idle reap, sticky CPU fallback. Stack runs offline now. 🐧 Linux / macOS / Windows. PostgreSQL OR SQLite. Local OR cloud LLMs. 🔗 github.com/mann1x/claude-hooks
reacted to Doradus-AI's post with 👍 about 12 hours ago
Tonight we validated a small upstream vLLM fix that brings GLM-5.1-REAP-478B back into our consumer-Blackwell rotation pool. Sleep/wake on 4× RTX PRO 6000 (SM_120) had a CuMemAllocator race that retired GLM in April: cuMemUnmap runs synchronously from the host the moment a pool-backed tensor's refcount hits zero, but kernels can still be in flight against that storage, accumulating CUDA_ERROR_ILLEGAL_ADDRESS, engine eventually unrecoverable. vllm-project/vllm#43020 is a one-line torch.cuda.synchronize() at the top of _python_free_callback. Steady-state inference unaffected (only cumem frees pay the cost). We caught the unpatched bug live during validation: ``` CUDA Error: invalid argument at /build/vllm/csrc/cumem_allocator.cpp:146 ``` That's the exact failure class #43020 fixes. With it bind-mounted in: Q3.6-27B sleep/wake cycle clean (25.8 GiB VRAM released on /sleep level=1, engine alive, post-wake chat coherent), GLM 30-request stress test 30/30 PASS, 0 CUDA errors. Back into rotation. Side win: we're also submitting a generic Triton autotune shmem-budget helper upstream that replaces hand-rolled check_shared_mem() ? [64,128] : [32,64] bucket switches with per-config precision via Triton's existing prune_configs_by={"early_config_prune": ...} hook. Zero change to the H100/H200 fast path. Submitted: vllm-project/vllm#43047 Full writeup with byte math + stress-test logs + the bind-mount overlay pattern: https://doradusresearch.ai/blog/sleep-mode-on-blackwell-part-2/ Hardware: 4× NVIDIA RTX PRO 6000 Blackwell Workstation Edition (SM_120, 95 GiB per GPU, 101 KiB per-block opt-in shmem). Image stack documented in the writeup!
reacted to kanaria007's post with 👀 about 12 hours ago
✅ Article highlight: Honest Benchmarking for Governed Intelligence Platforms (art-60-241, v0.1) TL;DR: This article argues that benchmark results should be published as bounded observations, not inflated into platform claims. A governed benchmark should not quietly turn “we measured this result under these conditions” into “therefore this platform is more governed, safer, or more production-ready.” Honest benchmarking separates reproducibility, comparability, and disclosability—and keeps benchmark outcomes distinct from stronger governance or platform-readiness claims. Read: https://huggingface.co/datasets/kanaria007/agi-structural-intelligence-protocols/blob/main/article/60-supplements/art-60-241-honest-benchmarking-for-governed-intelligence-platforms.md Why it matters: • prevents benchmark scores from being laundered into governance-readiness claims • distinguishes reproducible results from truly comparable rankings • makes public benchmark language respect disclosure floors and evidence class • gives a clean way to publish strong numbers without overclaiming what they mean What’s inside: • the separation between reproducibility, comparability, and disclosability • the rule that a benchmark result is not the same thing as a platform claim • a benchmark disclosure profile that sets the publication floor • a governed benchmark pack that binds runtime, toolchain, policy surface, evidence class, and results • a comparability declaration and benchmark publication report that state what public reading is actually supportable Key idea: Do not say: “we ranked higher, therefore we are better governed.” Say: “this governed benchmark pack produced these results under this disclosed runtime, toolchain, policy, and evidence surface; this comparability declaration defines what we are and are not fairly comparable to; and this publication report states exactly what public reading is supportable without inflating benchmark observations into stronger platform claims.”
View all activity

Organizations

Glide's profile picture open/ acc's profile picture mekasiu's profile picture Solving Real World Problems's profile picture FashionStash Group meeting's profile picture No More Copyright's profile picture XORTRON - Criminal Computing's profile picture