Scraper
Spider

A robotic spider About
Blog
@dbaman@fosstodon.org
Click ▶ to show/hide AI summary and keywords
Click The google logo for Google search on keywords

2026-03-09 02:48
rtx 3090
rtx 3090 stories from the last 14 days  | Back to all stories
221.  HN Show HN: Luna Agent – Custom AI agent in ~2300 lines of Python, no frameworks
Luna Agent is a custom-built AI agent developed by Fabio Nonato de Paula using approximately 2300 lines of Python, crafted independently from existing frameworks as part of a homelab project. Designed to address limitations in other evaluated frameworks, Luna Agent stands out with its efficient design and minimalistic codebase. It incorporates persistent memory management through SQLite, enabling advanced search functionalities while also facilitating integration via JSON configuration files. The agent includes safety measures for native operations and provides session isolation through a Discord interface. Additionally, it supports extensive context handling and structured logging, allowing it to operate on powerful local hardware without the need for cloud-based APIs. Emphasizing flexibility, Luna Agent offers configurable points for future enhancements, such as an AI firewall, detailed in its DESIGN.md file. The project’s source code is publicly available on GitHub, accompanied by a comprehensive technical blog post that delves into its design choices and motivations. Keywords: #phi4, AI agent, Discord interface, FTS5, GitHub, JSON logging, LLM traffic, Luna Agent, MCP tool integration, Python, Qwen3-Coder-Next Keywords: Luna Agent, RTX 3090, SQLite, architectural decisions, architectural decisions Final List: Luna Agent, conversation compression, design philosophy, embeddings, filtering proxy, frameworks, homelab project, llama-server, tests, tests Extracted Keywords: Luna Agent
    The google logo   nonatofabio.github.io a day ago
915.  HN Show HN: ContextCache – Cache tool schema KV states, skip 99% of prefill tokens
ContextCache is an open-source middleware that enhances the performance of large language model (LLM) interactions by caching tool schemas as key-value states, thus reducing unnecessary data processing and speeding up request handling. It addresses inefficiencies inherent in traditional LLM requests where static tool definitions are redundantly prefilled with each user query. The system significantly accelerates response times—evidenced by a reduction from 5,625ms to 193ms when managing 50 tools—while preserving the quality and accuracy of responses. Offering both CPU and GPU deployment options, ContextCache ensures high performance even on systems lacking powerful GPUs. It supports scalability with up to 100+ tools and incorporates features like independent caches for multiple tenants and least-recently-used (LRU) eviction strategies. Open-source under CC BY 4.0, it includes comprehensive documentation, a demo app, benchmarks, and integration guides. ContextCache operates in two primary modes: Route-only Mode, which facilitates quick query routing without an LLM (~500ms latency), and Full Pipeline Mode, providing complete orchestration from query routing to execution and synthesis using external LLMs such as Ollama or Claude. Additional features include compatibility with various LLM providers via OpenAI's API, secure server-side storage for credentials, a web-based admin UI for system management, and content-addressed caching to enhance storage efficiency across tenants. Overall, ContextCache is tailored for scenarios demanding rapid, efficient processing of LLM requests with minimal resource overhead. It offers flexibility in deployment environments and maintains high accuracy levels, making it an optimal choice for optimizing LLM interactions. Keywords: #phi4, API keys, CPU orchestrator, Claude, ContextCache, GPU, KV cache, LLM requests, OpenAI, Qwen3-8B, RTX 3090 Ti, content-addressed caching, enterprise features, llamacpp, multi-tenant, parameter extraction, persistent storage, server-side credentials, speedup, synthesis, tool routing, tool schemas, zero degradation
    The google logo   github.com 4 days ago
2474.  HN Sparky – useful 'living' OpenClaw bot
Sparky is an innovative "living" robot developed using OpenClaw, characterized by its seamless integration of personality design, voice user interface, and computer workflow enhancements. The project harnesses the power of an NVIDIA RTX 3090 for tasks such as face detection and voice processing while employing artificial intelligence to interact with tools like Emacs, SolveIt, tmux, and macOS. This robot is a manifestation of the creator's passion for synthesizing diverse ideas into a functional and engaging robotic companion. A video demonstration highlights Sparky’s capabilities, particularly its proficiency in multi-host networking and adept interaction within various workspaces, underscoring its potential as an advanced, interactive assistant in computational environments. Keywords: #phi4, AI tool-calling, NVIDIA RTX 3090, OpenClaw, SolveIt, Sparky, computer workflows, echo cancellation, emacs, face detection, macOS, personality design, robot buddy, tmux, voice UI, voice activity detection, wake word detection, workspace affordances
    The google logo   alexisgallagher.com 11 days ago
   https://github.com/algal/sparky   11 days ago
2475.  HN Show HN: Provision Stateless GPU Compute with Claude Code's Remote Control
Claude Code's Remote Control, powered by the Terradev MCP Server, offers a sophisticated solution for managing stateless GPU compute resources via natural language processing. This system enables users to provision and manage GPUs across various cloud providers such as AWS, GCP, Azure, from their local environments while maintaining secure API key storage on individual machines. Key functionalities include real-time, cost-optimized provisioning of diverse GPU types like NVIDIA H100 and A100, and the creation of NUMA-aware Kubernetes clusters with GPU nodes. It also facilitates model deployment to serverless platforms such as InferX or HuggingFace Spaces, along with inference endpoint management and cost optimization. Users can efficiently handle instance operations—viewing, stopping, starting, terminating—and analyze cost trends through integrated tools. The setup process requires the installation of Terradev CLI and MCP Server, followed by local API key configuration and integration with Claude Code. This tool supports a broad spectrum of GPUs and cloud providers, enabling comprehensive GPU cloud management via conversational commands, thereby streamlining complex cloud operations for users. Keywords: #phi4, API Keys, Cloud Providers, Cost Optimization, GPU Compute, GPU Instances, HuggingFace Spaces, Inference Deployment, Kubernetes Clusters, Multi-Cloud, Remote Control, Stateless Provisioning, Terradev MCP
    The google logo   github.com 11 days ago