Scraper Spider

221. HN Show HN: Luna Agent – Custom AI agent in ~2300 lines of Python, no frameworks

Luna Agent is a custom-built AI agent developed by Fabio Nonato de Paula using approximately 2300 lines of Python, crafted independently from existing frameworks as part of a homelab project. Designed to address limitations in other evaluated frameworks, Luna Agent stands out with its efficient design and minimalistic codebase. It incorporates persistent memory management through SQLite, enabling advanced search functionalities while also facilitating integration via JSON configuration files. The agent includes safety measures for native operations and provides session isolation through a Discord interface. Additionally, it supports extensive context handling and structured logging, allowing it to operate on powerful local hardware without the need for cloud-based APIs. Emphasizing flexibility, Luna Agent offers configurable points for future enhancements, such as an AI firewall, detailed in its DESIGN.md file. The project’s source code is publicly available on GitHub, accompanied by a comprehensive technical blog post that delves into its design choices and motivations. Keywords: #phi4, AI agent, Discord interface, FTS5, GitHub, JSON logging, LLM traffic, Luna Agent, MCP tool integration, Python, Qwen3-Coder-Next Keywords: Luna Agent, RTX 3090, SQLite, architectural decisions, architectural decisions Final List: Luna Agent, conversation compression, design philosophy, embeddings, filtering proxy, frameworks, homelab project, llama-server, tests, tests Extracted Keywords: Luna Agent

rtx 3090

nonatofabio.github.io a day ago

915. HN Show HN: ContextCache – Cache tool schema KV states, skip 99% of prefill tokens

ContextCache is an open-source middleware that enhances the performance of large language model (LLM) interactions by caching tool schemas as key-value states, thus reducing unnecessary data processing and speeding up request handling. It addresses inefficiencies inherent in traditional LLM requests where static tool definitions are redundantly prefilled with each user query. The system significantly accelerates response times—evidenced by a reduction from 5,625ms to 193ms when managing 50 tools—while preserving the quality and accuracy of responses. Offering both CPU and GPU deployment options, ContextCache ensures high performance even on systems lacking powerful GPUs. It supports scalability with up to 100+ tools and incorporates features like independent caches for multiple tenants and least-recently-used (LRU) eviction strategies. Open-source under CC BY 4.0, it includes comprehensive documentation, a demo app, benchmarks, and integration guides. ContextCache operates in two primary modes: Route-only Mode, which facilitates quick query routing without an LLM (~500ms latency), and Full Pipeline Mode, providing complete orchestration from query routing to execution and synthesis using external LLMs such as Ollama or Claude. Additional features include compatibility with various LLM providers via OpenAI's API, secure server-side storage for credentials, a web-based admin UI for system management, and content-addressed caching to enhance storage efficiency across tenants. Overall, ContextCache is tailored for scenarios demanding rapid, efficient processing of LLM requests with minimal resource overhead. It offers flexibility in deployment environments and maintains high accuracy levels, making it an optimal choice for optimizing LLM interactions. Keywords: #phi4, API keys, CPU orchestrator, Claude, ContextCache, GPU, KV cache, LLM requests, OpenAI, Qwen3-8B, RTX 3090 Ti, content-addressed caching, enterprise features, llamacpp, multi-tenant, parameter extraction, persistent storage, server-side credentials, speedup, synthesis, tool routing, tool schemas, zero degradation

rtx 3090

github.com 4 days ago

2474. HN Sparky – useful 'living' OpenClaw bot

Sparky is an innovative "living" robot developed using OpenClaw, characterized by its seamless integration of personality design, voice user interface, and computer workflow enhancements. The project harnesses the power of an NVIDIA RTX 3090 for tasks such as face detection and voice processing while employing artificial intelligence to interact with tools like Emacs, SolveIt, tmux, and macOS. This robot is a manifestation of the creator's passion for synthesizing diverse ideas into a functional and engaging robotic companion. A video demonstration highlights Sparky’s capabilities, particularly its proficiency in multi-host networking and adept interaction within various workspaces, underscoring its potential as an advanced, interactive assistant in computational environments. Keywords: #phi4, AI tool-calling, NVIDIA RTX 3090, OpenClaw, SolveIt, Sparky, computer workflows, echo cancellation, emacs, face detection, macOS, personality design, robot buddy, tmux, voice UI, voice activity detection, wake word detection, workspace affordances

rtx 3090

alexisgallagher.com 11 days ago
https://github.com/algal/sparky 11 days ago

2475. HN Show HN: Provision Stateless GPU Compute with Claude Code's Remote Control

Claude Code's Remote Control, powered by the Terradev MCP Server, offers a sophisticated solution for managing stateless GPU compute resources via natural language processing. This system enables users to provision and manage GPUs across various cloud providers such as AWS, GCP, Azure, from their local environments while maintaining secure API key storage on individual machines. Key functionalities include real-time, cost-optimized provisioning of diverse GPU types like NVIDIA H100 and A100, and the creation of NUMA-aware Kubernetes clusters with GPU nodes. It also facilitates model deployment to serverless platforms such as InferX or HuggingFace Spaces, along with inference endpoint management and cost optimization. Users can efficiently handle instance operations—viewing, stopping, starting, terminating—and analyze cost trends through integrated tools. The setup process requires the installation of Terradev CLI and MCP Server, followed by local API key configuration and integration with Claude Code. This tool supports a broad spectrum of GPUs and cloud providers, enabling comprehensive GPU cloud management via conversational commands, thereby streamlining complex cloud operations for users. Keywords: #phi4, API Keys, Cloud Providers, Cost Optimization, GPU Compute, GPU Instances, HuggingFace Spaces, Inference Deployment, Kubernetes Clusters, Multi-Cloud, Remote Control, Stateless Provisioning, Terradev MCP

rtx 3090

github.com 11 days ago

ScraperSpider

Scraper
Spider