66.
HN
Show HN: I over-engineered a home security camera that uses an LLM and talks
"Roz" is an innovative open-source home security system that leverages Python to function independently of cloud services or subscription models. Operating locally on a Raspberry Pi 4, it captures and processes webcam footage using OpenCV for motion detection while utilizing a separate PC with an RTX 3090 GPU to analyze scenes via the Qwen3.5 language model. The system identifies "meaningful changes" in video feeds compared to established baselines, subsequently announcing these events through Piper TTS-enabled text-to-speech audio alerts. Its architecture is designed for flexibility and customization, allowing users to adjust motion detection sensitivity and create personalized rules for change detection. Users can build Roz using a USB webcam and speakerphone on Linux-based systems, providing customizable hardware configurations. Installation of Roz requires setting up necessary dependencies and configuring the environment, with troubleshooting support available for audio and camera issues. The system is distributed under the GNU Affero General Public License v3.0, ensuring open access to its source code and allowing modifications while maintaining user freedom.
Keywords: #phi4, ALSA audio, DIY project, GNU AGPL-30, GPU, Home security, LLM, LM Studio, OpenAI API, OpenCV, Piper TTS, Python, Qwen35, Raspberry Pi, TTS synthesis, USB speaker, USB webcam, audio troubleshooting, camera focus, configuration file, frame differencing, hardware enclosure, llamacpp, local hosting, local processing, meaningful change, motion detection, motion sensitivity, privacy-focused, text-to-speech, uv, vLLM, video feed, vision analysis, web server streaming
github.com 13 hours ago
|
94.
HN
Show HN: Atombot – atomic-lightweight AI assistant for local models and GPT‑5.4
Atombot is a lightweight, self-hosted AI assistant designed for ease of understanding and extension, offering core functionality in about 500 lines of code, making it simpler compared to larger frameworks like OpenClaw which require thousands to hundreds of thousands of lines. Its features include persistent memory with searchable logs, Telegram-based access control, one-time and recurring reminders, and a skills system that aligns with the OpenClaw SKILL.md format. Atombot supports multiple Large Language Model (LLM) providers, including those using OpenAI-compatible endpoints or Codex in CLI mode, and provides provider-first onboarding that automatically detects models from Ollama, LM Studio, or Codex to set up configurations seamlessly.
Installation of Atombot can be done via source code for development purposes or through PyPI. Users can quickly start by initializing a workspace with the `atombot onboard` command, starting a Telegram gateway to interact with the AI assistant via chat, and using either Telegram or CLI for direct communication.
Keywords: #phi4, AI, AI assistant, Atombot, CLI, Codex, GitHub, LLM, LLM provider, OpenClaw, PyPI, Telegram, development, gateway, installation, lightweight, onboarding, persistent memory, personal, project structure, project structure Keywords: Atombot, quick start, reminders, self-hosted, skills, skills system, workspace
github.com 15 hours ago
|
201.
HN
How to run Qwen 3.5 locally
The document offers an extensive guide on deploying Alibaba's Qwen3.5 language model family on local devices, covering a range of models from 0.8B to 397B-A17B. It details how users can run these models using tools like Llama.cpp or LM Studio and provides instructions tailored for different hardware setups. The models support a context length of up to 256K across 201 languages and feature hybrid reasoning capabilities, with options for toggling thinking modes.
The guide highlights the use of Unsloth's advanced quantization technology, which enables state-of-the-art performance on lower-bit (3-bit to 8-bit) models optimized for tasks such as coding and long-context processing. Benchmark results show minimal accuracy loss with these optimizations, allowing large models to operate on devices with limited memory. Users can install and execute models via terminal commands and manage model preferences effectively.
Additionally, the guide covers setting up thinking modes for different tasks by adjusting parameters like temperature settings and penalties, ensuring optimal performance. The benchmarks confirm that Qwen3.5 achieves high accuracy with reduced memory requirements, facilitating efficient deployment in both personal and production environments. Overall, this manual serves as a comprehensive resource for leveraging Alibaba's latest language models locally, balancing size and performance efficiently across various hardware platforms through optimized quantization techniques.
Keywords: #phi4, Accuracy, Alibaba, Benchmarks, Context, Dynamic 4-bit, GGUF, Hardware, Hybrid Reasoning, Inference, KL Divergence, LLMs, LM Studio, Languages, Medium, Memory Footprint, Multimodal, Non-Thinking Mode, Quantization, Qwen35, Settings, Small, Thinking Mode, Tool Calling, Unsloth, llamacpp
unsloth.ai a day ago
https://gist.github.com/danthedaniel/c1542c65469fb1caaf 23 hours ago
https://github.com/ollama/ollama/issues/14419 22 hours ago
https://github.com/ollama/ollama/issues/14503 22 hours ago
https://www.localscore.ai 20 hours ago
https://www.tommyjepsen.com/blog/run-llm-locally-for-co 20 hours ago
https://github.com/brainless/dwata 20 hours ago
https://github.com/girvo/girvent/ 17 hours ago
https://pchalasani.github.io/claude-code-tools/integrat 17 hours ago
https://unsloth.ai/docs/models/qwen3.5/gguf-b 17 hours ago
https://www.siquick.com/blog/model-quantization-fine-tu 17 hours ago
https://fairwitness.bot/ 12 hours ago
https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF 12 hours ago
https://github.com/daegwang/atombot 12 hours ago
|
413.
HN
Show HN: Hydra – Real-time ops dashboard for developers running AI agents
Hydra is a macOS desktop application crafted specifically for developers who manage multiple AI agents and local development servers, offering real-time operational insights without relying on cloud services or telemetry. Constructed using Electron, React, and TypeScript, it provides comprehensive visibility into system metrics such as CPU/memory usage by processes, port-to-process mappings, Git repository health, network bandwidth, and security posture.
The application supports monitoring of eight AI agent types like Claude Code and Codex, integrating with LM Studio to facilitate local AI briefings without cloud API requirements. It features a robust dashboard consisting of 12 panels that cover workspace health, resource usage, git status, network monitoring, and security scans, among others. Hydra is equipped with auto-heal capabilities to address issues such as high CPU/memory utilization or missing processes/ports based on predefined rules.
Additionally, it includes Claude Code usage tracking, which provides insights into token usage and cost estimates. The app focuses on local data management by storing information in SQLite and allows users to customize settings via a config file or .env file. Built with modern web technologies like Tailwind CSS for styling and Zustand for state management, Hydra's testing is supported by Vitest. Although currently available only on macOS, its framework supports future expansion to other platforms such as Linux and Windows.
Hydra enhances developer productivity by centralizing the monitoring and management of AI agents and development environments. As an open-source project under the MIT license, it invites community contributions and improvements.
Keywords: #phi4, AI agents, CPU/memory, Claude Code, Electron, Git health, GitHub, Hydra, LM Studio, React, SQLite, Tailwind, TypeScript, Vitest, Zustand, auto-heal engine, configuration, dashboard, git status, local LLM, macOS, network bandwidth, platform support, platform support Comma-Separated Keywords: Hydra, platform support Comma-Separated List: Hydra, platform support Extracted Keywords: Hydra, platform support Final Keywords: Hydra, platform support Final List: Hydra, platform support Hydra, platform support Keywords: Hydra, platform support Selected Keywords: Hydra, platform support Simplified Keywords: Hydra, port mapping, process monitoring, security posture, system tray, testing
github.com 2 days ago
|
421.
HN
Show HN: Rental Property Deal Analyzer – 20 metrics, deal scoring, AI analysis
The Rental Property Deal Analyzer is an open-source tool aimed at evaluating rental property investments by calculating key financial metrics such as Cash-on-Cash Return, Cap Rate, and Debt Service Coverage Ratio (DSCR). It provides a 14-point deal scorecard to assess these metrics, helping investors make informed decisions. The backend utilizes FastAPI to deliver data via HTML/CSS/JS without requiring additional frameworks or build steps. Users can project five-year total returns, incorporating cash flow, appreciation, debt paydown, and tax benefits, while also assessing the fit of various investment strategies.
In addition to these features, the tool offers optional AI analysis through platforms like LM Studio, Ollama, or Anthropic Claude, with real-time response streaming. It employs data scraping techniques from Zillow using Playwright as a fallback option when necessary. The interface allows users to input details about property, loans, income, expenses, and reviews, generating detailed investment analyses that include monthly cash flow, comprehensive metrics, and five-year return projections with equity growth insights.
Users have the flexibility to save, compare scenarios, and export results in PDF or HTML format, adhering to an MIT license. The tool's source code is available on GitHub, allowing users not only to utilize its features but also to contribute or customize it according to their needs. This combination of detailed financial analysis and user-friendly functionality makes the Rental Property Deal Analyzer a versatile resource for investors seeking to evaluate rental property opportunities effectively.
Keywords: #phi4, AI Analysis, Break-Even Occupancy, Cap Rate, CapEx Reserve, Cash-on-Cash, DSCR, Deal Analyzer, FastAPI, GRM, HTML Export, Loan Details, Metrics, NOI, Operating Expenses, PDF Export, Playwright, Property Management, ROI, Rental Income, Rental Property, SSE, Strategy Fit, Total Return, Zillow Scraping
rental-property-deal-analyzer.onrender.com 2 days ago
|
439.
HN
Show HN: A local, multi-agent, customizable stack built for researchers
The article presents "Vers3Dynamics R.A.I.N. Lab," an innovative open-source research stack crafted using Rust and Python, aimed at facilitating reproducible experiments through voice conversations. Its primary goal is to offer a customizable, local platform that echoes the ethos of 20th-century Bell Labs, allowing researchers to fluidly transition from conceptual ideas to experimental artifacts without depending on opaque systems. Central to its functionality are two core components: ZeroClaw, a Rust-based agent runtime responsible for orchestration, tool management, and policy enforcement; and James Library, which provides Python workflows specifically tailored for acoustic physics and resonance research, enabling the study of non-linear wave interactions and bio-acoustic phenomena.
Additionally, Vers3Dynamics employs Godot to create multi-agent visual interfaces, enhancing user interaction and understanding. Security is a key consideration within this platform, as it treats all external text inputs as untrusted by default. The setup process has been streamlined for ease of use, featuring pre-built binaries and scripts that facilitate rapid installation across Linux, macOS, and Windows platforms. Emphasizing reliability, the system includes repo integrity checks and efficient handling of gateway requests.
Development tools such as Rust's cargo and Python's pip are utilized for testing and formatting purposes, ensuring a smooth development experience. Comprehensive documentation is provided under the MIT License to support user adoption and collaboration. Originally developed by Vers3Dynamics as a research and development tool, this platform has been made open-source to encourage wider collaboration within the research community.
Keywords: #phi4, AI, CLI, Godot, James Library, MIT License, Python, R&D, Rust, Vers3Dynamics, ZeroClaw, acoustic physics, agents, benchmarks, execution engine, experiments, gateway, health check, memory system, orchestration, policy enforcement, reasoning, resonance, runtime, synthesis, virtual environment, visualization, voice conversations, workflows
github.com 2 days ago
|
528.
HN
World Monitor – AI-powered news aggregation
World Monitor is an AI-driven global intelligence platform that offers real-time news aggregation, geopolitical monitoring, and infrastructure tracking via a unified dashboard. It integrates over 435 curated feeds from more than 100 sources into categories including geopolitics, technology, finance, commodities, and positive news. The platform enhances situational awareness with interactive maps displaying up to 45 data layers such as conflicts, military bases, and trade routes. Key features include AI-generated geopolitical briefs, real-time updates with live video streams, and a comprehensive market radar providing financial insights. Supporting content in 21 languages, World Monitor is accessible through web-based platforms and native desktop applications for macOS, Windows, and Linux without any user costs, utilizing open-source technologies.
The platform employs advanced AI models like Ollama and Groq to facilitate summarization, deduction, and threat classification, offering dual map engines with both 3D globes and flat maps. World Monitor provides API access for developers, prioritizing security through CORS origin allowlists and input sanitization. Community contributions are encouraged, with development guidelines, deployment details, and licensing information available under AGPL-3.0 in the project's repository. Users can explore insights via various subdomains tailored to general insights and specific domains such as tech, finance, commodities, and positive trends. For support or security issues, users have designated contact channels, acknowledging responsible vulnerability disclosures by researchers.
Keywords: #phi4, AI summarization, AI-powered, Country Instability Index, desktop app, dual map engine, geopolitical monitoring, infrastructure tracking, multi-signal analysis, native-language support, news aggregation, open-source, real-time updates, threat classification
github.com 2 days ago
|
537.
HN
Show HN: Utter, a free local dictation and meeting notes app for Mac and iPhone
"Utter" is a free application available on Mac and iPhone designed to transform voice notes into clean, well-formatted text with a strong emphasis on privacy and local data handling. It offers rapid transcription services with sub-second accuracy and customizable post-processing to enhance clarity without any cost or cloud storage requirements. Key functionalities include the ability to create personalized shortcuts, adapt to various workflow modes, generate speaker-labeled transcripts from audio recordings, employ context-aware processing for more relevant text outputs, summarize links within notes, and utilize Markdown for note editing. The app supports complete local data retention while providing seamless synchronization through iCloud without necessitating an account setup. Designed with privacy-conscious users in mind, "Utter" facilitates a smooth transition between phone and desktop environments by converting rough voice recordings into polished text documents, addressing the demand for intuitive, secure dictation tools that handle audio files locally.
Keywords: #phi4, AI chat, BYOK, LM Studio, Mac, Markdown editor, Ollama, Parakeet, Utter, audio/video file transcription, context-aware processing, dictation app, dictation keyboard, dictation keyboardKeywords: Utter, iCloud sync, iPhone, link summarization, local models, local workflows, meeting recording, no account registration, post-processing, privacy, shortcuts, speaker-labeled transcripts, transcription
utter.to 2 days ago
|
558.
HN
Show HN: Triplecheck – Review your code free with local LLMs
Triplecheck is an open-source AI-driven code review tool designed to facilitate thorough and cost-effective code reviews by utilizing local language models such as Qwen3-Coder or DeepSeek Coder, avoiding the expenses associated with API usage. It features a multi-pass review cycle that conducts up to five rounds of reviews from diverse perspectives, incorporating a voting mechanism to reduce false positives. Additionally, it supports both local and cloud hybrid models for efficient resource utilization, offering initial reviews locally while utilizing cloud models like Claude Opus for quality judgment.
The tool integrates comprehensive testing automatically after each code fix attempt, ensuring that regressions are identified early in the process. It provides structured feedback on potential bugs, detailing aspects such as file location, line number, severity, and suggested fixes. Furthermore, Triplecheck allows users to customize its pipeline, enabling model configuration, behavior adjustments, and integration with static analysis tools.
Currently, Triplecheck supports multiple programming languages including Python, Go, and Rust, and is effective in bug detection across extensive codebases. However, it lacks GitHub PR integration and incremental reviews, though these features are planned for future development. Compared to other AI code review tools like CodeRabbit and Sourcery, Triplecheck distinguishes itself by offering free local operations and a more robust multi-pass review engine that includes actual code fixes rather than mere suggestions.
Looking ahead, Triplecheck's roadmap aims to enhance its capabilities through GitHub PR integration, support for incremental diff-only reviews, and the generation of PR summaries. Future enhancements include developing a VS Code extension, web report viewer, and expanding platform compatibility to encompass GitLab and Bitbucket. The tool is built using Python and Click CLI, with configuration options compatible with various OpenAI-compatible backends or local LLMs, positioning Triplecheck as a versatile option for developers seeking AI-enhanced code reviews without recurring costs.
Keywords: #phi4, AI, CI test gate, CLI, GitHub, GitHub integration, LLMs, OpenAI-compatible, PR summary, Python, SARIF output, SAST integrations, SAST integrations Keywords: Triplecheck, Triplecheck, VS Code extension, bugs, code review, diff-only review, free API cost, local models, multi-pass voting, patches, severity, static analysis, structured findings, tests, tree-sitter
github.com 3 days ago
|
567.
HN
Atombot – A tiny but powerful personal AI assistant
Atombot is a streamlined personal AI assistant designed with efficiency in mind, achieving its core functionalities within about 500 lines of code, making it notably smaller than previous models such as OpenClaw and nanobot. It supports integration with multiple Large Language Model (LLM) providers compatible with OpenAI endpoints and Codex through CLI mode. The bot features a Telegram-based chat access control system, offers persistent long-term memory with searchable logs, and includes capabilities for scheduled reminders and a skills system that aligns with OpenClaw's SKILL.md format. Atombot serves as a versatile personal assistant capable of performing tasks such as web fetching, coding assistance, and schedule management. Users can install Atombot from the source for development purposes or through PyPI for easy usage. Setting up Atombot involves initializing the workspace by detecting providers, configuring optional Telegram integration, and starting interactions either via Telegram or CLI. The project's design efficiently supports these functionalities, facilitating a seamless user experience.
Keywords: #phi4, AI, AI assistant, Atombot, CLI, Coding, GitHub, LLM provider, OpenClaw, PyPI, Schedule Manager, Telegram, Web Fetch, configuration, gateway, interactive chat, nanobot, onboarding, persistent memory, reminders, skills, skills system, terminal, terminal Keywords: Atombot, workspace
github.com 3 days ago
https://github.com/daegwang/atombot 3 days ago
|
632.
HN
Show HN: Canvo – AI agent with live canvas and Linux sandbox on Android
Canvo is an innovative Android application that transforms mobile devices into powerful AI workstations by integrating an interactive canvas, a real Linux environment, and a plethora of tools for enhanced productivity while on the go. Its standout feature, the AI Agent, transcends traditional chatbots by creating dynamic, live workspaces within conversations. Users can engage with data through the Data Canvas, which supports interactive elements such as dashboards, charts, forms, and quizzes. The inclusion of a Linux Sandbox provides access to over 300 Unix commands, allowing for the installation of programming languages like Python and Node.js, enabling local web app development directly on the device.
In terms of tools, Canvo offers unlimited functionalities, building them automatically for tasks such as file management and notifications while supporting persistent scripts and autonomous operations. The application prioritizes privacy with a local-first data storage approach, giving users control over their AI endpoints through Bring Your Own Keys (BYOK) without resorting to cloud sync or telemetry. For installation, users must download an APK and permit installations from unknown sources on Android 13+ devices with arm64-v8a architecture.
Canvo's autonomous capabilities include proactive features like scheduled tasks, memory retention, and automated notifications for updates, such as morning briefings. Currently in beta, Canvo invites user feedback to refine its functionalities and allows users to switch between different AI models per session based on task requirements, supporting a variety of providers including Google Gemini, Anthropic Claude, OpenAI GPT, Groq Llama, among others.
Keywords: #phi4, AI Agent, AI Workstation, Android, Autonomous Tasks, Beta Development, Data Visualization, Interactive Canvas, Linux Sandbox, OpenAI-Compatible, Persistent Workspace, Privacy First, Unix Commands
github.com 3 days ago
|
710.
HN
Show HN: Nemilia – multi-agent AI workspace in a single HTML file, no back end
Nemilia is a cutting-edge AI workspace designed for seamless multi-agent orchestration within a single HTML file, eliminating the need for any backend infrastructure. It empowers users by granting full control over their data, models, and workflows directly on personal devices, emphasizing privacy and user sovereignty. Key features include the ability to create custom agents with distinct roles and personalities using an intuitive drag-and-drop interface, supporting multi-provider AI ecosystems like OpenAI and Anthropic as well as offline capabilities through WebGPU for local model execution.
The platform offers advanced functionalities such as document retrieval augmented generation (RAG) with hybrid search methods, human-in-the-loop checkpoints within workflows, and secure data processing entirely on the client side. Nemilia supports a variety of modes including chat, research reports, and visual content creation, while allowing workspace synchronization to local folders for version control.
VISION is highlighted as an integral tool for image generation, capable of producing code-based visuals without external keys and supporting AI-generated images from multiple providers. It emphasizes the capability to run models locally in modern browsers using WebGPU after initial setup, with specific VRAM requirements based on model choice.
The MCP Tool Execution Tutorial guides users through setting up a workspace folder and initiating an MCP Server for integration within Nemilia. This involves configuring connections to the MCP server, defining agents that use TOOLCALL blocks for file interactions via external tools—all processed client-side. The tutorial also covers workspace management to ensure non-destructive edits and updates.
Additional features include customizable prompts, memory systems for workflow history retrieval, and advanced configurations for AI Provider settings, agent creation, and execution flow control. Compatibility notes address browser requirements and keyboard shortcuts, while the changelog provides insights into ongoing enhancements, bug fixes, and system optimizations across Nemilia versions.
Keywords: #phi4, AI sovereignty, AI-generated images, API keys, Business Source License, DAG execution, HITL review, HTML file, MCP protocol, Nemilia, VISION, WebGPU, agents, browser inference, browser-native, client-side, code-based visuals, data privacy, document RAG, file system API, human-in-the-loop, hybrid search, image generation, live web research, local models, memory injection, memory system, model overrides, multi-agent AI, no backend, offline mode, orchestrator, predictive execution engine, prompt templates, provider-agnostic, semantic vector search, tool execution, visual content generation, workflow management, workflows, workspace, workspace sync, zero servers
github.com 3 days ago
|
714.
HN
Show HN: Neo – AI-powered native .NET desktop app generator
N.E.O. is an innovative AI-powered tool designed to convert natural language prompts into live .NET desktop applications seamlessly. The setup process is straightforward, requiring only the standard .NET runtime while automatically managing additional dependencies like Python when necessary. This tool enables users to develop native Windows applications using WPF or Avalonia frameworks and supports iterative development through plain language commands. It also accommodates hybrid stacks by integrating C#, web technologies, and Python.
The technical capabilities of N.E.O. are extensive. It offers SDK-less compilation, automatic dependency management, and self-healing features that address errors and crashes. Users benefit from visual editing options, robust security measures with optional sandboxing, and a branching undo/redo system to enhance productivity. Additionally, the applications can be exported across different platforms and integrated with AI services during runtime.
The author contemplates whether N.E.O., originally conceived as a side project, could serve as a valuable open-source initiative. This consideration is particularly pertinent for niche areas where desktop applications surpass web-based solutions in performance, such as enterprise tools or offline applications. Although the code requires further refinement, there's potential to polish it and contribute to the developer community, leveraging its unique capabilities.
Keywords: #phi4, AI-powered, C# toolchain, NEO, NET, SDK-less compilation, community project, cross-platform export, desktop app generator, frictionless setup, hybrid stack, native applications, natural language prompts, security sandboxing
news.ycombinator.com 3 days ago
|
726.
HN
Show HN: RAGLight, serve a RAG pipeline as a REST API and chat UI in one command
RAGLight is a versatile Python library designed for implementing Retrieval-Augmented Generation (RAG), integrating document retrieval with natural language inference. It supports various large language models and embedding providers, facilitating the creation of context-aware AI solutions. The library features a new `serve` command that launches a FastAPI server with an optional Streamlit chat UI, providing an interactive RAG pipeline accessible via both a REST API and user interface.
Key components include modular integration of different LLMs, embeddings, and vector stores, supporting models like HuggingFace's MiniLM for efficient vector embedding. The Agentic RAG Pipeline enhances performance using an Agent to improve results. It also offers MCP Integration, allowing external tool capabilities such as code execution and database access via MCP servers.
RAGLight supports flexible document ingestion from diverse formats including PDFs, TXTs, DOCXs, etc., and features an extensible architecture for swapping vector stores, embedding models, or LLMs. The library can be deployed swiftly with a REST API using environment variables for configuration. It includes health checks, question generation, document ingestion (locally or from GitHub), file uploads via multipart/form-data, and listing collections.
Additional tools include an Interactive CLI for rapid setup and interaction with documents, and Docker Deployment options with example images provided. A notable feature is the hybrid search option combining BM25 keyword-based retrieval and dense vector similarity search using Reciprocal Rank Fusion (RRF) to enhance accuracy. Installation is straightforward via pip, with extensive documentation available to assist users in configuration and deployment processes.
Keywords: #phi4, BM25, Docker, FastAPI, LLMs, MCP Integration, RAGLight, REST API, Reciprocal Rank Fusion, Retrieval-Augmented Generation (RAG), Streamlit, agent pipeline, chat UI, code execution, database access, document retrieval, embeddings, extensible architecture, external tools, hybrid search, language generation, semantic search, vector stores
github.com 3 days ago
|
793.
HN
Show HN: Nemilia – multi-agent AI workspace in a single HTML file, no back end
Nemilia is an advanced browser-based tool that allows users to create and manage multi-agent AI systems entirely on the client side without any server dependency. It operates within an HTML file, eliminating the need for backend setups, installations, or account creation. The platform emphasizes AI sovereignty by granting users complete control over their agents, workflows, data, and encryption keys, ensuring privacy from third-party platforms.
Key features of Nemilia include custom agent creation with distinct roles and personalities, a drag-and-drop interface for designing workflows that can chain multiple agents in any desired order, and the inclusion of human-in-the-loop review checkpoints. Agents have the capability to execute external tools in real-time via the Model Context Protocol (MCP) and perform document retrieval augmented generation using both semantic and keyword searches processed client-side with vector embeddings and BM25.
Nemilia supports a wide range of AI providers such as OpenAI, Anthropic, Groq, Gemini, etc., allowing users to switch seamlessly between them and run models locally through WebGPU for offline capabilities. Security is maintained by encrypting API keys using AES-256-GCM within the browser and ensuring no data leaves the user's machine unless initiated explicitly by the user.
The tool offers high portability by syncing workspaces to local folders, facilitating version control and editing. Its architecture ensures all processing is done client-side, enhancing both performance and security. Nemilia provides a comprehensive AI workspace solution prioritizing data sovereignty, cross-platform compatibility, and user flexibility in their AI projects.
The accompanying tutorial for Nemilia outlines how to leverage the platform for image generation and local model execution without server connections. It covers generating code-based visuals like charts using Chart.js, SVG diagrams, HTML infographics, and AI-generated images with various providers requiring API key configuration. Local model execution is possible on supported browsers through WebGPU, facilitating direct browser operation of models such as Llama or Mistral.
The tutorial also details setting up local workspace folders for file syncing without overwriting existing data and employing prompt templates and a memory system for continuity in tasks across AI sessions. It introduces Model Context Protocol (MCP) execution with external tool operations like file manipulation, using a local MCP server setup through Supergateway. Additionally, it demonstrates constructing multi-agent workflows that enable agents to work sequentially or in parallel on tasks such as web research and report writing.
Nemilia includes settings for defaults controlling output tokens, temperature, retries, storage options, live reasoning badges, context safety checks, WebGPU model expansion, and a polished UI enhancing user experience. Licensed under the Business Source License 1.1 (BSL 1.1), Nemilia will transition to an MIT license in February 2030, with commercial usage before then requiring separate licensing agreements.
Overall, this tutorial provides a robust framework for utilizing both code-based and AI-generated visuals within Nemilia's ecosystem, alongside local execution of complex models and integration with external tools to boost productivity and workflow automation.
Keywords: #phi4, AI provider, AI sovereignty, AI-generated images, API keys encryption, BM25 keyword search, BSL 11 license, DAG pipeline, HITL checkpoints, HTML file, MCP tool execution, Nemilia, WebGPU offline mode, browser inference, browser-native, chat interface, client-side, code-based visuals, custom agents, document RAG, encryption, file system operations, human-in-the-loop review, hybrid Transformersjs embeddings, image generation, image providers, local inference, local models, memory system, multi-CDN fallback, multi-agent AI, no backend, orchestrator, predictive execution engine, prompt templates, provider-agnostic, reasoning model support, semantic search, semantic vector RAG, session memory, visual progress ring, visual workflow design, web search providers, workflow builder, workflows, workspace, workspace sync, zero servers
github.com 4 days ago
|
813.
HN
Show HN: Open dataset of real-world LLM performance on Apple Silicon
Anubis OSS is an open-source benchmarking tool developed to evaluate the performance of local AI applications on Apple Silicon devices, such as M1 through M4 chips. It addresses a gap in community-driven data by enabling users to conduct and submit benchmarks across various models using backends like Ollama and LM Studio. The tool leverages native SwiftUI, avoiding external dependencies, to collect hardware telemetry while assessing inference performance. Anubis simplifies the benchmarking process with rapid execution times and one-click result submissions, fostering a comprehensive open dataset that enhances understanding of efficiency and configuration impacts on Apple Silicon. This community-driven dataset offers insights into quantization effects, thermal management, and helps identify suboptimal setups, filling gaps left by synthetic benchmarks or limited reviews. By engaging with Anubis through GitHub stars, users contribute to its broader accessibility via Homebrew Cask distribution, promoting tool development, research, and optimization for Apple Silicon platforms.
Keywords: #phi4, Anubis OSS, Apple Silicon, IOReport, LLM performance, Open dataset, OpenAI-compatible backend, SwiftUI app, community resource, hardware telemetry, leaderboard submissions, local AI benchmarking, quantization efficiency
devpadapp.com 4 days ago
https://github.com/ggml-org/llama.cpp/discussions& 3 days ago
|
906.
HN
Show HN: TerminalNexus – Turn CLI commands into reusable buttons (Windows)
TerminalNexus is a Windows-based tool developed by Dan to streamline the usage of Command Line Interface (CLI) commands, transforming them into easily accessible buttons within a multi-tab terminal environment. This facilitates users in organizing and executing commands efficiently without having to manually search through notes or command history. The application boasts several advanced features: it allows for scheduling commands with output tracking, generates AI-driven summaries from command outputs, and can produce Git commit messages. Additionally, TerminalNexus provides optional security checks prior to commits and enables conversion between different shell types—Bash, PowerShell, and CMD. Users gain insights into runtime performance and codebase metrics through its interface.
TerminalNexus supports integration with both local and cloud-based AI providers, including Ollama, OpenAI, Anthropic, OpenRouter, and LM Studio. It also offers the capability to schedule recurring tasks that are automatically summarized upon completion, enhancing productivity. The tool allows customization for data retention, ensuring that if a local model is used, user data remains on their machine. Currently exclusive to Windows users, TerminalNexus includes a free 14-day trial without requiring any signup process. Additional details and download links can be found at Safesoftwaresolutions.com.
Keywords: #phi4, AI, AI summaries, Anthropic, Bash, CLI, CLI commands, CMD, CWE, CWE Top 25, Git, Git commit messages, LM Studio, OWASP, OWASP Top 10, Ollama, OpenAI, OpenRouter, PowerShell, TerminalNexus, Windows terminal, Windows-only, buttons, cloud AI, cloud AI providers, codebase, codebase insights, command scheduling, free trial, free trial Keywords: TerminalNexus, local AI, local AI providers, reusable buttons, runtime, runtime insights, scheduling, scripts, shell, shell conversion
news.ycombinator.com 4 days ago
|
924.
HN
New RAGLight feature: deploy a RAG pipeline as a REST API with one command
RAGLight is a versatile Python library designed to enhance Large Language Models (LLMs) through Retrieval-Augmented Generation (RAG), enabling document retrieval capabilities for building advanced, context-aware AI solutions. It emphasizes modularity, allowing users to integrate various LLMs from providers like Ollama, LMStudio, Mistral, OpenAI, and Google, alongside embedding models such as HuggingFace's all-MiniLM-L6-v2. The library includes key features such as an agentic RAG pipeline for improved performance, MCP integration for external tool capabilities (e.g., code execution and database access), flexible support for diverse document types like PDFs and TXT files, and an extensible architecture allowing easy component swaps.
RAGLight supports seamless deployment options including a REST API accessible via `raglight serve`, eliminating the need to write Python code and enabling configuration through environment variables. It also provides a command-line interface with tools such as `raglight chat` for interactive document selection and dialogue initiation, alongside Docker-based deployments that facilitate integration with services like Ollama or LMStudio.
The library uses environment variables for configuring server settings and provider details while offering features like default ignore folders to streamline document indexing. RAGLight is demonstrated through examples for creating knowledge bases from directories or GitHub repositories, setting up both RAG and agentic RAG pipelines, and enabling hybrid search functionalities that combine BM25 with semantic search techniques. Additionally, it supports custom processors tailored to specific file types such as PDFs containing diagrams. Overall, RAGLight stands out as a robust tool for developing sophisticated AI applications by merging retrieval methods with generative models.
Keywords: #phi4, BM25, ChromaDB, Docker Compose, Docker deployment, FastAPI server, FolderSource, GitHubSource, Google Gemini, LLM integration, LMStudio, Large Language Models, Mistral API, Ollama, OpenAI API, Python library, RAGLight, REST API, REST endpoints, RRF, Reciprocal Rank Fusion, Retrieval-Augmented Generation, agent pipeline, code execution, database access, document ingestion, document retrieval, embeddings, environment variables, health check, hybrid search, knowledge base, natural language inference, semantic search, vector store operations, vector stores
github.com 4 days ago
https://github.com/Bessouat40/RAGLight 4 days ago
https://raglight.mintlify.app/documentation/rest-api 4 days ago
|
1001.
HN
Show HN: Security Audit for Macs Running Local AI (Ollama, OpenClaw, LM Studio)
The "Mac Security Audit" script is a comprehensive tool developed to bolster the security of macOS systems, particularly those configured as AI workstations such as Mac Minis running applications like Ollama and OpenClaw. Its primary function is to identify prevalent misconfigurations and vulnerabilities including unsecured network bindings, weak authentication tokens, exposed Docker ports, and deactivated firewalls. The script operates in three distinct modes: audit-only for assessing security postures without taking corrective actions; a full audit mode that includes firewall assessments; and an auto-fix mode which automatically addresses rectifiable issues.
Central to its functionality, the script scrutinizes macOS-specific security settings such as firewall activation status, FileVault encryption integrity, and remote access configurations. It also evaluates AI agent security by examining the status of OpenClaw gateways and the robustness of authentication tokens. Additionally, it audits network services by checking listening ports and exposures via Tailscale, along with server-related configurations like sleep settings. The script is compatible with macOS version 12 or newer and relies on Bash version 3.2+, employing native tools without necessitating external dependencies.
Upon execution, the script provides a detailed output delineating the status of each security check conducted, categorizing findings into critical issues, informational notes, warnings, and auto-fixed problems. The project is open for contributions aimed at enhancing its functionality with additional checks or installation methods, distributed under an MIT license.
Keywords: #phi4, AI Agents, Auto-fix, Auto-restart, Bash, Critical Issues, Docker, FileVault, Firewall, Gatekeeper, Hardening Script, Homebrew Formula, LM Studio, LaunchAgents, Listening Ports, Local AI Workstations, MIT License, Mac Minis, Network Exposure, Ollawa, OpenClaw, Remote Access, SIP, SSH, Security Audit, Security Checks, Sleep Settings, Software Updates, Tailscale, macOS
github.com 4 days ago
|
1246.
HN
Building an Inference Engine in 1,800 Lines of C++
The article details the development of "toasted.cpp," a local inference engine written in C++ that significantly enhances processing speed for a 30-billion parameter model, achieving 100 tokens per second on a MacBook—a substantial improvement over previous Python implementations. This advancement was driven by key architectural and design choices, such as using Qwen3-Coder-Next with Mixture-of-Experts (MoE) and Hybrid attention architecture to manage large context sizes efficiently. Optimization techniques played a crucial role, including transitioning from Python to C++ through MLX's API, which improved graph fusion support and addressed issues like type leaks and inefficient GPU operations. Pre-filling strategies were refined by restructuring into chunked batches, enhancing prefill speeds dramatically.
Architectural innovations included implementing a session cache that minimized redundant processing in unchanged conversation histories, improving response times by 125x, and compiled step functions to reduce CPU-side graph construction overheads, optimizing token generation speed. Insights from the project highlighted that substantial performance gains typically result from architectural changes rather than micro-optimizations. Large Language Models (LLMs) were found more adept at code generation than optimization due to their reliance on pattern matching over system-specific reasoning.
Additionally, the unique unified memory architecture of Apple Silicon necessitated a shift in optimization strategies, moving away from traditional discrete GPU bottlenecks. The distribution strategy for the model involved using rsync for efficient file transfer with features such as resumable downloads and delta transfers. Overall, the project showcases significant performance improvements through innovative architectural changes and offers insights into system understanding versus pattern recognition in AI optimization tasks.
Keywords: #phi4, C++, DeltaNet, Inference Engine, MLX, Mixture-of-Experts, Unix socket, compiled step functions, fp16 leak, macOS, optimization, rsync, session cache, speculative decoding
linuxtoaster.com 5 days ago
|
1343.
HN
2x Qwen 3.5 on M1 Mac: 9B builds a bot, 0.8B runs it
The article outlines the process of creating a Telegram bot using Qwen 3.5 models on an M1 Mac with limited resources, specifically 16 GB RAM. It involves setting up two main components: OpenCode, which utilizes the larger Qwen3.5-9B-GGUF model for coding tasks, and LM Studio, running the smaller Qwen3.5-0.8B-GGUF model to manage chat interactions. The setup requires installing OpenCode through command line instructions and configuring it alongside a local instance of LM Studio that functions as an OpenAI-compatible server on localhost.
The author demonstrates how the Telegram bot forwards messages to this local configuration, retrieves responses, and maintains data privacy by operating offline. Although the hardware constraints result in slower performance, the setup proves beneficial for small teams prioritizing confidentiality in their workflows. The article suggests potential improvements with more advanced Apple Silicon or stronger desktop setups. Essential steps include installing OpenCode, setting up LM Studio with specific models, and developing a Python-based Telegram bot within a virtual environment. This configuration emphasizes local data handling and offline operation, offering an alternative for sensitive tasks on limited hardware without replacing high-end coding stacks.
Keywords: #phi4, API endpoint, Apple Silicon, GitHub repository, JSON schema, LM Studio, MacBook M1, Metal llamacpp, OpenAI-compatible endpoints, OpenCode, Qwen35, RAM usage, Telegram bot, coding model, context window, environment variables, hardware performance, inference backend, local server, offline tasks, private workflow, python-telegram-bot, reply model, sensitive data, tokens, venv
advanced-stack.com 6 days ago
|
1514.
HN
Real-time global intelligence dashboard for news and geopolitical monitoring
World Monitor is an advanced AI-powered dashboard designed for comprehensive global intelligence, news aggregation, and real-time monitoring of geopolitical events, infrastructure developments, and natural disasters. It integrates various curated data sources into a unified interface featuring interactive maps with over 40 customizable data layers such as conflict zones, military activities, and environmental hazards. The platform supports multilingual access to 16 languages and offers AI-synthesized briefs, ensuring users can focus on specific areas like geopolitics or tech by seamlessly switching between different dashboard variants.
A standout feature is the interactive 3D globe powered by WebGL technology, which includes smart clustering for enhanced performance. This allows users to visualize complex datasets interactively and in real-time, leveraging AI-driven translation and semantic search capabilities through a Retrieval-Augmented Generation system. World Monitor's commitment to privacy is evidenced by its open-source framework, enabling local deployment on user hardware with secure storage of API keys via OS keychain integration.
The platform offers robust data processing features including real-time updates for various intelligence signals like market trends and military movements. It also incorporates live video streaming capabilities ensuring continuous playback across devices. Signal aggregation includes anomaly detection using Welford’s algorithm, providing temporal tracking of global events while supporting social sharing with rich previews via dynamic Open Graph images.
Designed to offer a seamless experience, the dashboard is available as both a Progressive Web App and through Tauri for desktop use, facilitating offline functionality and local API handling. Additionally, it integrates multiple advanced intelligence capabilities such as maritime and aviation tracking, prediction market analysis, and security advisories from numerous sources. Infrastructure resilience modeling and GPS interference mapping are key features enhancing its analytical depth.
The system’s configuration interface allows users to manage settings like language models and data source credentials without interruption, thanks to independent verification pipelines for each tab. It supports automatic model discovery with fallback options and utilizes a JSON blob in the OS keychain to synchronize changes across UIs efficiently. Debugging is facilitated through verbose mode logs and accessible DevTools.
Updates are managed via an auto-update checker, ensuring users have access to the latest features without service interruption, while smart caching strategies optimize performance, particularly for offline map browsing. The dashboard's design incorporates mobile optimization, allowing drag-and-drop reordering and intelligent alert popups to enhance user interaction.
For strategic intelligence and forecasting, World Monitor employs a tiered AI summarization approach using both local and cloud-based models optimized for network conditions, ensuring efficient processing and result caching. It provides detailed country dossiers with instability indices and predictive analytics. The system also features sophisticated threat classification and hotspot escalation scoring to dynamically assess geopolitical risks.
Furthermore, the platform integrates real-time data from various sources, including military intelligence, cyber threat feeds, and natural disaster monitoring using Open-Meteo ERA5 datasets for climate anomaly detection. This integration allows comprehensive risk assessment by combining insights into strategic theater postures, undersea cable health, and infrastructure dependencies.
In essence, World Monitor offers a holistic solution for global monitoring and analysis, leveraging cutting-edge technology to deliver actionable intelligence through a user-friendly interface that supports diverse analytical needs and operational contexts.
Keywords: #phi4, ACLED, AI Summarization, AI forecasting, AI-powered aggregation, AIS Detection, API Keys, CORS, Cache Purge, Circuit Breakers, Climate Anomaly Detection, Climate Panel, Command Palette, Country Export, Country Instability Index, Cyber Threat Intelligence, Data Freshness, Deduction Panel, Download API, EONET, ERA5 reanalysis, Edge Functions, Feature Toggles, Forecasting, GDACS, GDELT, GPS Interference, GPS/GNSS Interference, GeoJSON, Geopolitical analysis, Groq LLM, HMR, Haversine-deduplication, Headline Memory, Historical Playback, Humanitarian Data, IOCs, Infrastructure Cascade Modeling, Intelligence Dossier, ML Worker, Map Overlay, Map State, Military Surge Detection, Mobile Optimization, Natural Disaster Monitoring, OREF Alert, Oil Analytics, Open-Meteo, OpenAI-compatible endpoint, Population Estimation, Protest Tracking, Protocol Buffers, RPC, Real-time intelligence, Redis Deduplication, Redis caching, Regression Testing, Service Monitoring, Stock Indices, Strategic Risk Score, TV Mode, Telegram Feed, Telegram OSINT Feed, Travel Advisory, Trending Keywords, UCDP Conflict, Undersea Cable Monitoring, Universal Coverage, Vercel, configuration UI, geolocation, geopolitical monitoring, infrastructure tracking, live video streams, market analysis, multilingual support, news context, news feeds, rate-limiting, scatter dots, semantic search, signal aggregator, threat classification
github.com 6 days ago
|
1597.
HN
Show HN: ClawShield – Open-source security proxy for AI agents (Go, eBPF)
ClawShield is an open-source security proxy crafted to safeguard AI agents, utilizing Go and eBPF technologies. Positioned as a defensive layer in front of the OpenClaw AI gateway, its primary function is to scrutinize all incoming and outgoing communications through various scanning mechanisms—prompt injection detection, secrets/PII identification, vulnerability assessment, and malware recognition. This comprehensive system operates under a deny-by-default policy framework, allowing customization via YAML configuration files for tool allowlists/denylists, domain restrictions, and specific agent/channel rules, with all decisions meticulously logged in SQLite for auditing purposes.
Enhancing security further, ClawShield incorporates optional features like an iptables egress firewall to regulate network traffic and an eBPF kernel monitor that detects abnormal system behaviors such as fork bombs or privilege escalations. Its user-friendly setup process involves Docker commands, supporting installation through pre-built binaries or direct source compilation.
The architecture of ClawShield is grounded in a defense-in-depth strategy across three distinct layers: application-level message analysis with policy enforcement, network-layer egress management, and kernel-level syscall monitoring for detecting behavioral anomalies. As a production-ready tool, it can be deployed with additional security protocols such as TLS termination via Nginx. Moreover, ClawShield integrates five specialized AI agents equipped with RAG (Retrieval-Augmented Generation) knowledge bases, providing robust protection against threats like prompt injections and data leaks.
ClawShield is open for community contributions on GitHub under the Apache 2.0 license and builds upon the OpenClaw framework, adapting traditional network security models to suit AI environments. This makes it a versatile and comprehensive solution for fortifying AI agent ecosystems.
Keywords: #phi4, AI Agents, Audit Logging, Behavioral Anomaly Detection, Canary Token, ClawShield, Defense-in-Depth, Docker, Firewall, Go, HTTP WebSocket, Malware Detection, Network Security, Open-source, PII Scanning, Policy Engine, Real-time Alerts, Reverse Proxy, Secrets Detection, Security Proxy, Syscall Monitoring, TLS, Vulnerability Scanning, eBPF
github.com 7 days ago
|
1675.
HN
Show HN: Epstein-Search – Local, AI-Powered Search Engine for the Epstein Files
Epstein-Search is an open-source, AI-powered local search engine tailored for semantic searching of the Epstein Files, which comprise publicly accessible court documents, FBI reports, flight logs, and similar materials. Built in Python, it offers both command-line interface (CLI) functionalities and library features to conduct searches or operate Retrieval-Augmented Generation (RAG) models without the need for cloud services or API keys, ensuring privacy. The engine utilizes a local vector database called zvec, which stores pre-computed document embeddings for swift indexing and rapid querying. Users can execute standard searches locally using sentence-transformers to process query embedding and similarity searching against this indexed data.
In addition to traditional search capabilities, Epstein-Search introduces a conversational RAG mode via LiteLLM, supporting both local models like Ollama and external cloud providers such as Anthropic, OpenAI, or Gemini. The setup process is streamlined into three steps: installing the tool, configuring the database, and initiating an interactive chat interface. This involves downloading approximately 100K document chunks with pre-computed embeddings, allowing users to begin immediately.
The search functionality can be refined by filtering results based on specific document types like court filings or flight logs, and it enables displaying both raw source context and generated answers. The project encourages support through cryptocurrency donations, which are detailed in its GitHub repository. Importantly, the dataset is sourced from public domain materials, adhering to open access standards.
Keywords: #phi4, AI-Powered, Cloud LLMs, DOJ, Epstein Files, Epstein-Search, FBI Reports, Flight Logs, Interactive Mode, LM Studio, Legal PDFs, LiteLLM, MIT License, Ollama, Open Source, Public Domain, Python CLI, RAG, Semantic Search, Sentence-Transformers, Vector Database, zvec
github.com 7 days ago
|
1915.
HN
Unsloth Dynamic 2.0 GGUFs
Unsloth Dynamic 2.0 introduces an advanced quantization technique that excels over previous methods by achieving superior 5-shot MMLU accuracy and reducing KL Divergence. This innovative approach facilitates fine-tuning across various inference engines such as llama.cpp and Ollama, while maintaining high levels of model accuracy. The method features a sophisticated layer selection process tailored to optimize quantization schemes for both MoE and non-MoE architectures, leveraging a carefully curated calibration dataset to enhance conversational chat capabilities without the overfitting issues prevalent in earlier datasets.
Benchmarking highlights significant improvements with models like Aider Polyglot and Gemma 3, which often surpass full-precision counterparts using this method. Dynamic v2.0 efficiently manages quantization across layers, minimizing disk space usage while preserving accuracy. Tests on MMLU scores underscore the necessity of precise implementation to avoid performance drops. Additionally, the success is evidenced by bug fixes in Llama 4 models that substantially increased their MMLU Pro accuracy. Overall, Unsloth Dynamic 2.0 marks a substantial advancement in quantization technology, offering enhanced efficiency and improved model performance with reduced resource demands.
Keywords: #phi4, Dynamic, GGUFs, Gemma 3, KL Divergence, Llama 4, MMLU, QAT, accuracy, benchmarks, bug fixes, calibration dataset, disk space, efficiency, inference engine, layer selection, model-specific quants, overfitting, performance, perplexity, quantization, v20
unsloth.ai 8 days ago
https://unsloth.ai/docs/models/qwen3.5/gguf-b 8 days ago
https://huggingface.co/blog/moe 8 days ago
https://bknyaz.github.io/blog/2026/moe/ 8 days ago
https://github.com/qskousen/ggufy 8 days ago
https://huggingface.co/unsloth/Qwen3.5-35B-A3B-Experime 8 days ago
https://unsloth.ai/docs/models/qwen3.5/gguf-b 8 days ago
https://unsloth.ai/docs/models/qwen3.5/gguf-b 8 days ago
https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF 8 days ago
https://huggingface.co/unsloth/Kimi-K2-Thinking-GGUF 8 days ago
|
1967.
HN
A tool to launch your OpenClaw in just 1 minute
OpenClaw Hosting provides a managed cloud platform specifically tailored for the seamless deployment of OpenClaw, an open-source autonomous AI agent. It simplifies this process through its one-click deployment feature, eliminating the need for technical know-how or server management by users. The platform is compatible with any OpenAI-compatible model endpoint and offers complimentary access to sophisticated models like Kimi K2.5 from Moonshot AI. A key feature of OpenClaw Hosting is its commitment to privacy; each instance runs in a secure, isolated Docker container that ensures user data remains private and can be exported at any time. Moreover, OpenClaw’s integration capabilities extend across various messaging platforms including Telegram, WhatsApp, Discord, Slack, Signal, and iMessage, enabling continuous 24/7 operation across channels. This setup allows users to engage with the AI agent in diverse environments efficiently and securely.
Keywords: #phi4, Anthropic Claude, Discord, Docker setup, Google Gemini, Kimi K25, LM Studio, Moonshot AI, Ollama, OpenAI-compatible model, OpenClaw, SSL, Signal, Slack, Telegram, VPS, WhatsApp, auto-updates, autonomous AI agent, iMessage, infrastructure, isolated container, managed cloud platform, multi-channel agent, one-click deployment
clawhost.chat 9 days ago
|
2315.
HN
LM Link: Use local models on remote devices, powered by Tailscale
LM Link is a collaborative effort between LM Studio and Tailscale that provides a secure platform for sharing open-weight Large Language Models (LLMs) across devices without requiring public internet access. It allows users to operate LLMs on their private hardware while ensuring privacy and security through encrypted connections. The service simplifies the process of connecting remote devices, enabling model sharing with straightforward authentication procedures. LM Link caters to various applications such as accessing high-capacity models from home, providing team-wide access to substantial models, secure internal testing environments, enhancing edge computing capabilities, and supporting private industry-specific uses. The platform utilizes tsnet for handling secure traffic without altering the kernel, ensuring privacy in device communications. LM Studio aids in discovering and configuring devices, while Tailscale's private network streamlines technical setup processes. While free for personal use, enterprise solutions are also offered by LM Link.
Keywords: #phi4, GPU-backed models, Go program, LM Link, LM Studio, Tailscale, desktop app, device discovery, encrypted connections, end-to-end encryption, enterprise plans, keep-alives, local models, open-weight LLMs, private networks, remote devices, terminal commands, tsnet
tailscale.com 10 days ago
|
2366.
HN
LMStudio LM Link: Use your local models, remotely
LMStudio has developed the LM Link feature to facilitate secure, end-to-end encrypted connections between devices using custom Tailscale mesh VPNs. This functionality allows users to operate local models on remote devices seamlessly as if they were running locally, enhancing privacy and efficiency in operations. Importantly, this setup restricts chat interactions to individual devices, ensuring conversations remain private and confined to specific hardware. For discovery of compatible devices, only device lists are shared, with all other data kept strictly local, never uploaded to LM Studio's servers. This design underscores the commitment to user privacy and security by preventing unnecessary data transfer and maintaining complete control over personal information within a secure network environment.
Keywords: #phi4, LM Link, LM Studio, Tailscale mesh VPNs, backend servers, chats local, custom VPNs, device connection, device discovery, encryption, end-to-end encrypted, local models, model loading, remote devices
lmstudio.ai 10 days ago
|
2419.
HN
LM Link: Use local models on remote devices, powered by Tailscale
LM Link, developed in collaboration with LM Studio, provides a secure method for sharing large language models (LLMs) between devices owned by users without the risk of exposure on public networks. This is achieved through Tailscale's encrypted connections, facilitating easy and safe model distribution across personal and professional devices globally. The system simplifies access to remote LLMs via laptops or GPU-powered servers and ensures secure end-to-end encryption irrespective of network location. Setting up LM Link with LM Studio allows users to seamlessly connect local and remote models through desktop applications or terminal commands.
The platform offers broad applications, such as enabling home users to utilize powerful LLMs remotely, providing research teams access to advanced models, and enhancing edge devices' computational abilities. It supports industries prioritizing data privacy by keeping model use on-premises and allows developers to securely test large models without exposing their infrastructure. LM Link is built with tsnet in Go, ensuring secure, auditable device connections without the need for altering kernel configurations. Importantly, all communication between connected devices remains private as neither Tailscale nor LM Studio's backend service can access transmitted data. While LM Studio offers free personal use, enterprise plans are available for additional features. Users interested in starting with LM Link can visit their website to create an account and begin using the service.
Keywords: #phi4, GPU-backed models, Go program, LM Link, LM Studio, Tailscale, desktop app, device discovery, encrypted connections, end-to-end encryption, enterprise plans, keep-alives, local models, open-weight LLMs, private networks, remote devices, terminal commands, tsnet
tailscale.com 11 days ago
|
2455.
HN
I built an open-source AI Gateway that sits between your apps and LLM providers
The AI Gateway is a versatile, open-source, and self-hosted API gateway designed to facilitate communication between applications and various Large Language Model (LLM) providers. It offers an OpenAI-compatible API that efficiently routes requests to supported backend services such as Google Gemini, OpenAI, Anthropic, Mistral, Perplexity, xAI, Cohere, Azure OpenAI, Ollama, or LM Studio. This platform emphasizes individualized client management by providing unique API keys, assigning specific backends, and enforcing rate limits along with token quotas while allowing optional system prompts.
A key feature is its built-in admin dashboard which provides real-time usage monitoring and comprehensive client management capabilities. The AI Gateway supports all supported providers through streaming using Server-Sent Events (SSE), offering flexible configuration options for clients to select different models and backend services as per their needs. Initial setup involves downloading or building the gateway, followed by running it to create a configuration file and obtain admin credentials, with further adjustments made via an administrative interface.
The project is meticulously organized into modules that manage HTTP requests, middleware functions, database models, backend providers, and implement robust security measures. These measures include hashed client API keys, signed cookies, security headers, and restrictions on request sizes to ensure secure operations. Additionally, per-client features such as model whitelists, rate limits, and token quotas enhance its functionality. Developed under the MIT license, the AI Gateway offers a scalable solution for managing LLM interactions across diverse applications and services.
Keywords: #phi4, AI Gateway, API, Docker, LLM, LLM providers, OpenAI-compatible, WebSocket, admin dashboard, architecture, rate limits, security headers, security headers Keywords: AI Gateway, self-hosted, streaming, token quotas
github.com 11 days ago
|
2489.
HN
Swival – A coding agent for open models
Swival is an innovative coding agent tailored for open models, capable of integrating seamlessly with tools such as LM Studio and the Hugging Face Inference API to autonomously complete tasks using an independent tool loop. The system stands out for its simplicity in setup, as it automatically detects any model loaded within LM Studio without needing manual configuration. This streamlined process is underpinned by a compact implementation involving only a few thousand lines of Python code. Notably, Swival's design eschews reliance on external frameworks, emphasizing ease of use and efficiency. Overall, Swival provides an intuitive solution for enhancing the functionality of open models through autonomous task completion, leveraging its straightforward setup and robust integration capabilities.
Keywords: #phi4, HuggingFace Inference API, LM Studio, Python, Swival, autonomous tool loop, coding agent, completion, framework, model discovery, open models, task, zero setup
swival.github.io 11 days ago
|
2517.
HN
Show HN: A CLI to query the unsealed court files with local LLMs
"epstein-search" is a free, open-source command-line interface tool designed for querying unsealed court files associated with Jeffrey Epstein, utilizing local Large Language Models (LLMs). It caters to audiences who prefer technical solutions and aligns well with Show HN post preferences. The tool efficiently processes thousands of poorly scanned PDF documents into searchable segments using a Retrieval-Augmented Generation (RAG) pipeline. Users can conduct semantic searches locally without the need for API keys, employing tools such as OpenAI or Anthropic for speed, or Ollama and Llama.cpp for privacy considerations.
The tool offers straightforward setup with three commands: `pip install epstein-search`, `epstein-search setup`, and `epstein-search chat`. It features an interactive mode where users can toggle between search-only and RAG modes, switch LLM models, modify result settings, and review current configurations. Support for both cloud-based and local LLMs allows flexibility based on user privacy needs.
Released under the MIT license, "epstein-search" is accessible to anyone interested in leveraging its capabilities. It includes a dataset of over 100,000 pre-computed document embeddings sourced from publicly available documents like those from the U.S. Department of Justice and FBI reports. The project offers support through crypto tips for users who wish to contribute financially.
Keywords: #phi4, CLI, Hacker News, LLMs, PDF parsing, RAG pipeline, document chunks, embeddings, interactive mode, local model, open-source, privacy-first, semantic search, vector database
github.com 11 days ago
|
2544.
HN
LM Studio: LM Link
LM Link is a feature in LM Studio designed to facilitate secure, end-to-end encrypted communication between devices running either LM Studio or llmster through custom Tailscale mesh VPNs. This functionality enables users to seamlessly access models on remote devices as if they were local, while ensuring chat privacy by limiting shared data to device lists for discovery purposes only. Importantly, no information is transmitted to backend servers, thereby enhancing user security and privacy in model interactions across devices.
Keywords: #phi4, LM Link, LM Studio, Tailscale mesh VPNs, backend servers, connection, custom VPNs, device discovery, devices, end-to-end encrypted, local chats, models, remote devices
lmstudio.ai 11 days ago
|
2744.
HN
Worldmonitor: Real-time global intelligence dashboard
The document describes a sophisticated global intelligence platform designed for real-time insights into worldwide events through comprehensive data integration from over 100 sources. It features an interactive UI with AI-generated briefs and multilingual support, offering users access to synthesized summaries of geopolitical occurrences, natural disasters, cyber threats, and more without the need for costly OSINT tools.
Key functionalities include a highly interactive 3D globe that uses WebGL technology to display various data layers such as military bases, disaster zones, and cyber risks. This platform emphasizes AI-powered intelligence with capabilities like focal point detection and hybrid threat classification systems which are enhanced by local large language models (LLMs). Additionally, it provides real-time updates on geopolitical events, social unrest, market movements, and infrastructure statuses.
The system supports localization through a multilingual user interface capable of delivering region-specific news feeds in 16 languages. It also includes signal intelligence context, prediction market integration, and multiple export options while allowing users to customize their experience with panel resizing, theme toggling, and data source management via feature toggles.
A strategic maritime monitoring component classifies vessels based on MMSI prefixes and AIS data using a relay server, generating heatmaps for traffic density analysis. The platform employs advanced user interface adjustments such as responsive layout optimizations for ultra-wide monitors without relying on JavaScript, and theme switching stored in localStorage.
The intelligence system offers comprehensive country briefs enriched with geopolitical insights and real-time signals. It utilizes GeoJSON for local-first detection of countries from map interactions, ensuring quick identification without network dependence. AI summarization processes are optimized through tiered analysis using local computation, cloud APIs, or browser inference to minimize redundant processing while employing Redis caching.
The threat classification pipeline integrates keyword pattern matching with machine learning and LLM classifiers for detailed event categorization based on severity levels. Country Instability Index (CII) scores dynamically assess 22 countries' stability in real-time by considering factors like unrest, security activity, and news velocity. Geographic convergence detection identifies simultaneous events within cells to inform strategic assessments across multiple regions.
Additional capabilities include undersea cable health monitoring with data sources such as NGA warnings and AIS tracking, infrastructure cascade modeling for predicting disruption propagation, and climate anomaly detection using ERA5 data. The platform also tracks refugee movements via UN OCHA data, influencing instability indices by estimating population exposure from active events using WorldPop data. Strategic port infrastructure models assess ports based on trade and military significance.
The document highlights a browser-side ML pipeline utilizing Transformers.js to facilitate localized AI processing, reducing server dependency while enabling efficient news data processing. Real-time visuals from geopolitical hotspots are provided through YouTube live streams with optimized resource management. Privacy-first analytics via PostHog ensure user anonymity while implementing privacy measures in data collection and reporting.
The platform adopts a tri-variant architecture supporting three specialized dashboards—World Monitor, Tech Monitor, and Finance Monitor—with distinct data feeds and panels. It utilizes Vercel Edge Functions and Railway relay servers for efficient API request management and caching. The system is encapsulated in a Tauri desktop application with offline capabilities via local Node.js sidecars. Secure secret management is achieved by storing API keys in OS credential managers, ensuring user privacy and data security.
Overall, the platform delivers actionable intelligence across geopolitical, technological, and financial domains using advanced data processing, machine learning techniques, and responsive design, providing users with a versatile tool for monitoring global events.
Keywords: #phi4, AI-powered dashboard, Real-time intelligence, anomaly detection, financial markets, geopolitical monitoring, global news aggregation, infrastructure tracking, interactive map, maritime tracking, multilingual support, predictive analytics, threat detection
github.com 12 days ago
|
2769.
HN
A small tool I made for local LLMs: LLM-neofetch-plus
LLM-Neofetch Plus enhances the capabilities of NeoFetch by offering specialized features for users managing local Large Language Models (LLMs) such as Ollama and llama.cpp. It provides detailed system information, including GPU VRAM specifics with model identification like NVIDIA or Intel. Additionally, it evaluates a machine's capacity in terms of billions of parameters that can be efficiently managed. The tool explains various GGUF quantization methods, comparing options such as Q4_K_M and Q8_0. LLM-Neofetch Plus also facilitates comparisons among different LLM software solutions including Ollama, llama.cpp, vLLM, and LM Studio, while providing functionality for disk speed testing with results exportable in JSON or Markdown formats. Installation is simple via the command `pip install llm-neofetch-plus`, and detailed insights can be obtained through `llm-neofetch -d 3`. Users are encouraged to provide feedback on its usability and suggest improvements via the project's GitHub page at [GitHub LLM-Neofetch Plus](https://github.com/HFerrahoglu/llm-neofetch-plus).
Keywords: #phi4, AMD, Apple M series, GGUF quantization, GPU VRAM, GitHub, Intel, JSON, LLM-neofetch-plus, LM Studio, Markdown, NVIDIA, NeoFetch, Ollama, disk speed test, installation, llamacpp, local LLMs, parameters, vLLM
news.ycombinator.com 12 days ago
|
3029.
HN
Arcee-AI/Trinity-Large-Preview
Arcee AI's Trinity-Large-Preview is a sophisticated language model featuring 398 billion parameters, characterized by a sparse Mixture-of-Experts (MoE) architecture that includes approximately 13 billion active parameters per token. As the largest model within its family, it has been trained on over 17 trillion tokens and excels in understanding extensive contexts due to this vast training corpus. The model is a refined version of its base variant, enhanced through reinforcement learning specifically targeting improved chat functionalities.
The Trinity family includes several variants: the lightly post-trained Trinity-Large-Preview for immediate chat applications, the pre-anneal checkpoint Trinity-Large-TrueBase with 10 trillion tokens, and the fully pretrained Trinity-Large-Base model incorporating mid-training anneals. The architecture of Trinity-Large-Preview is designed to optimize both efficiency and capacity through its sparse MoE configuration, which includes 256 experts (with one shared among them), four active experts per token, and six dense layers. Initially trained with an 8,192-token context length that extends to 512k for processing, the model's design supports extensive contextual understanding.
In terms of performance benchmarks, Trinity-Large-Preview demonstrates superior capabilities compared to models like Llama 4 Maverick, particularly on evaluations such as MMLU and AIME. The training process involved pretraining with 17 trillion tokens using Datology and posttraining through instruction tuning on an additional 20 billion tokens using Prime Intellect's infrastructure. This was facilitated by 2,048 NVIDIA B300 GPUs employing hybrid data and expert parallelism.
Users can access Trinity-Large-Preview via several platforms, including the Transformers Library in Python, VLLM & llama.cpp with specific release versions, LM Studio, and the OpenRouter API for application integration. The model is distributed under the Apache License 2.0, with users encouraged to cite its use appropriately. It is available for testing at chat.arcee.ai, and detailed technical specifications are provided in an accompanying report.
Keywords: #phi4, AIME 2025, API, Apache License, Arcee AI, Datology, Expert Parallelism, GGUF, GPQA-Diamond, HSDP, Llama 4 Maverick, MMLU, Mixture-of-Experts, NVIDIA B300 GPUs, OpenRouter, Prime Intellect, RL, Transformers, Trinity-Large-Preview, VLLM, active parameters, benchmarks, chat-ready, long-context comprehension, parameters, post-trained, sparse model
huggingface.co 13 days ago
|
3138.
HN
World Monitor – Real-time global intelligence dashboard
The provided text outlines a sophisticated intelligence monitoring system called World Monitor, which integrates AI technologies to aggregate global news, monitor geopolitical events, and track infrastructure developments in real-time. This platform offers a comprehensive interface with over 100 curated feeds, interactive map features, and multilingual support across 16 languages. Users can visualize various data layers such as conflicts, military bases, and climate anomalies on an interactive 3D globe rendered using WebGL technologies, enabling smooth performance and detailed situational awareness.
World Monitor ensures seamless user experience by supporting multiple variants (geopolitical, tech, finance) from a single codebase while providing region-specific content and interfaces. Its advanced clustering algorithms facilitate signal aggregation, anomaly detection, and trend analysis, visualized through charts and heatmaps on the dashboard. The platform emphasizes privacy by processing AI locally without external APIs, with offline capabilities enabled via caching strategies.
Key features include real-time updates from live video streams, financial intelligence tools, and the ability to export intelligence briefs. Additionally, it offers a desktop application running efficiently using Tauri for local data processing. Beyond these core functionalities, World Monitor integrates various specialized components such as maritime monitoring through AIS data for vessel tracking at strategic chokepoints, an alert system for high-priority findings, and regression tests for map overlays.
Further capabilities include synthesizing country-specific intelligence into comprehensive briefs supported by offline geometry services, a multi-tier AI model chain for summarization, classification pipelines for news items using machine learning and keyword matching, and scoring systems for instability and hotspot detection. It assesses military postures across operational theaters, monitors US Navy fleet deployments, enriches military flight data, and tracks undersea cable health.
The system also supports infrastructure cascade modeling to evaluate disruptions' impact on global trade routes and energy supplies while indexing entities to detect temporal anomalies in critical activities. It identifies keyword spikes for threat detection through RSS feeds analysis, employs Protocol Buffers for API contracts, merges cyber and natural disaster intelligence, and incorporates prediction markets for geopolitical insights. Overall, World Monitor presents a holistic tool for real-time strategic risk assessment across military, geopolitical, and environmental domains using cutting-edge AI and machine learning technologies.
Keywords: #phi4, AI-powered dashboard, Real-time intelligence, financial markets, geopolitical monitoring, global news aggregation, infrastructure tracking, interactive map, maritime tracking, market radar, multilingual support, predictive analytics, threat detection
github.com 13 days ago
|
3153.
HN
LLMs Feed Your Re Habit: Following the Use-After-Free Trail in CLFS
The article delves into how Large Language Models (LLMs) augment reverse engineering (RE) by enhancing the process of identifying vulnerabilities such as use-after-free issues within intricate systems like Windows' Common Log File System (CLFS). By integrating LLMs with tools like pyghidra-mcp and Ghidra, engineers can accelerate analysis through improved understanding of control flows, identification of critical code paths, and automated validation against real binaries. This synergy reduces manual effort significantly, allowing for a deeper focus on investigation rather than repetitive verification tasks.
LLMs serve as an augmentation to human expertise, facilitating faster and more intuitive analyses by automating routine tasks such as cross-referencing functions and generating call graphs. These advancements lead to substantial time savings—up to 90% faster analysis—and increase confidence levels due to automated validation steps. The potential of LLMs extends beyond familiar territories, promising transformative impacts on RE workflows even in less familiar areas like macOS XPC services.
The article emphasizes the development of skills that leverage these AI tools to maintain investigative momentum without getting entangled in complex details. Overall, integrating LLMs into reverse engineering practices creates a more efficient and engaging workflow, enhancing control and confidence for engineers navigating sophisticated systems and highlighting the benefits of AI-driven analysis in technical investigations.
Keywords: #phi4, CLFS, CVE-2025-29824, FsContext2, Ghidra, IOCTL, IRP, LLMs, Windows kernel, agentic RE, automation, binary analysis, confidence, dataflow, local LLMs, macOS XPC services, momentum, patch diffing, pyghidra-mcp, reverse engineering, triage, use-after-free, vulnerability
clearbluejar.github.io 13 days ago
|
3224.
HN
OpenCrabs: AI terminal-native orchestration layer for software development
OpenCrabs is an advanced AI orchestration tool designed for software development, developed in Rust with inspiration from Open Claw. It facilitates integration with multiple AI providers, supporting a wide range of models from platforms such as Anthropic, OpenAI, and OpenRouter. The tool features multimodal input capabilities, including text, images, PDFs, and voice transcription via Telegram integration, along with plans for further integrations like Slack, Discord, and WhatsApp. It boasts a sophisticated terminal-native interface offering advanced cursor navigation, session management, and syntax highlighting.
The core functionalities of OpenCrabs include real-time streaming support, local language model usage, cost tracking, and built-in tools for file operations, web searches, and task planning. Its agent capabilities are notable, allowing the AI to modify its source code and hot-restart itself autonomously, thereby enhancing its utility in complex software development workflows.
OpenCrabs offers several quick start options: downloading pre-built binaries, building from source with Rust nightly and API keys, or using Docker for a containerized environment. The tool comes with an onboarding wizard that assists users through the initial setup process, including provider selection and agent customization. Configuration is managed via `keys.toml`, emphasizing security by recommending restrictive file permissions.
Local LLMs like LM Studio and Ollama are supported for private operation without incurring costs. OpenCrabs employs a built-in tool execution system to enhance task performance during interactions, facilitated by its plan mode and sophisticated memory management using a 3-tier architecture. Debugging is streamlined with conditional logging that can be enabled in debug mode.
The architectural overview highlights key components: presentation layers (CLI/TUI), intelligence management (brain layer), application services, and data storage through SQLx and SQLite. Integration supports various messaging platforms and voice tools, relying on technologies such as Tokio for asynchronous operations, Serde for serialization, Reqwest for HTTP requests, and qmd for hybrid memory search using FTS5 and vector embeddings.
OpenCrabs is actively developed under an MIT license with community contributions encouraged. It emphasizes privacy by ensuring data remains local to the user's machine, supporting offline functionality except when initial setup or external API calls necessitate internet access. Users are responsible for managing their own cloud-based API costs but can opt for local LLMs to avoid expenses.
In summary, OpenCrabs is a comprehensive AI orchestration tool aimed at streamlining software development processes through its secure, efficient platform and versatile integration capabilities, while maintaining user data privacy and offering customizable command functionalities.
Keywords: #phi4, AI orchestration, API Cost Management, CLI, Discord Integration, Feature Flags, Hot Restart, Local LLMs, MIT License, Memory Search, OpenCrabs, Rust, SQLite, Self-Updating, Session Persistence, Slash Commands, TUI, Telegram Bot, Vector Embeddings
github.com 13 days ago
|
3244.
HN
Worldmonitor: Real-time global intelligence dashboard
World Monitor is an innovative open-source dashboard designed to consolidate AI-powered news, geopolitical intelligence, and infrastructure data into a cohesive interface. It features interactive maps with over 35 data layers, real-time updates, and supports multiple languages through multilingual UIs. The platform offers strategic insights via AI synthesis of information using local LLM solutions like Ollama and provides comprehensive country briefs.
Key functionalities include an Interactive 3D Globe powered by WebGL for seamless performance, supporting diverse data types such as conflicts and natural disasters. The dashboard integrates a drag-and-drop interface with saved customizations in `localStorage`, alongside pin mode map interactions to enhance user experience. Regression testing of map overlays is conducted using Playwright, ensuring reliability.
Country brief pages are detailed, providing users with intelligence dossiers featuring instability indexes, AI-generated summaries, and real-time active signals. The system employs a multi-tiered AI summarization chain prioritizing local computation, falling back on cloud APIs when necessary, which helps minimize LLM calls through deduplication and caching.
The Threat Classification Pipeline processes news items via keyword matching and AI-based assessments to evaluate threats effectively. Additionally, the Country Instability Index (CII) offers continuous monitoring with a real-time score based on various factors like unrest events. This is complemented by hotspot escalation scoring, geographic convergence detection, and strategic theater posture assessments.
World Monitor incorporates USNI fleet intelligence, aircraft enrichment through ADS-B data, and undersea cable health monitoring to deliver comprehensive insights into military activities and infrastructure vulnerabilities. Infrastructure cascade modeling identifies critical assets within proximity of geo-located events, supported by temporal baseline anomaly detection for event type deviations.
The system provides a robust technical architecture using Vercel Edge Functions for API handling and Railway as an alternate feed origin, along with a Tauri desktop application that supports offline functionality via local Node.js processing. Security measures like CORS and input sanitization are in place, alongside optimizations such as CDN caching and intelligent polling to conserve bandwidth.
Designed with privacy at its core, World Monitor can function entirely offline, offering users control over their data security while enabling them to analyze diverse datasets efficiently across geopolitical, technological, and financial domains. Its responsive design adapts across various screen sizes, providing a unified geospatial intelligence picture for decision-making.
Keywords: #phi4, AI summarization, AI-generated analysis, AI-powered, AIS Tracking, AIS chokepoint detection, API gateway, Auto-Update, Breaking alerts, CSS custom properties, Country Brief, Cyber Threats, D3js, Disaster Monitoring, ETF flow estimation, GeoJSON, Groq, Instability Index, Intersection Observer, ML Pipeline, Map pin, Military Activity, OS keychain integration, Ollama, OpenRouter, PWA, Playwright, Privacy Architecture, Protest Data, Real-time intelligence, Redis, Strategic Ports, Theme System, Undersea Cables, Webcam Surveillance, YouTube proxy, anomaly detection, bandwidth optimization, caching architecture, climate anomalies, cloud fallback, cloud keys, convergence scoring, data freshness, desktop app, desktop auto-update, download API, drag-and-drop, edge functions, energy analytics, entity extraction, error tracking, exportable intelligence, feature toggles, feed tiering, financial centers, geopolitical hotspots, geopolitical monitoring, global markets, iframe, infrastructure tracking, intelligence gaps, interactive map, keyword monitoring, lazy-loading, live video streams, live webcam, local LLM, localStorage, macro signals, maritime tracking, multi-platform architecture, multi-source integration, multilingual support, news aggregation, non-tier country support, offline support, panel resizing, population exposure, prediction markets, regression testing, secret management, security model, semver comparison, service worker, signal aggregation, source tiering, stablecoin monitoring, threat classification, traffic logging, trend detection, trending keywords, tri-variant architecture, velocity metric, virtual scrolling
github.com 14 days ago
|
3274.
HN
Running autonomous AI agents on Apple Silicon for $1.50/month (14 errors later)
The author details their experience of running autonomous AI agents locally using OpenClaw on Apple Silicon for a minimal electricity cost of $1.50/month, highlighting significant savings over cloud API fees. OpenClaw, an open-source framework introduced in January 2026, enables these agents to autonomously perform tasks such as managing Slack messages, conducting research, and monitoring infrastructure without constant supervision. The author's setup operates on macOS or Linux with a minimum of 32GB RAM, utilizing network isolation tools like Tailscale for secure operations.
Despite initial challenges involving configuration errors and performance issues under sustained load, the system was optimized by adjusting settings in LM Studio and OpenClaw to improve efficiency. This includes handling tasks like context window adjustments, quantization settings tuning, and concurrency optimization. The robust setup allows AI agents to handle specific workflows autonomously, demonstrating cost-effective management of self-hosted AI systems.
The implementation leverages a dual-slot architecture for web chat tasks with minimal latency by utilizing caching, significantly enhancing efficiency compared to cloud services. The author integrates this system into their workflow via Slack, where tasks are processed asynchronously and results returned without direct oversight, facilitating operations like system monitoring and data summarization.
An innovative aspect of the setup involves training OpenClaw agents using another AI (Claude Code), which translates memory entries into workspace configurations that dictate local model responses. This process is likened to onboarding an employee, underscoring the customization potential for specific tasks.
The cost analysis underscores low electricity expenses as a primary benefit, with hardware amortization considered in future upgrades and software issue resolutions. The document concludes by offering guidance on maintaining system functionality through specific configuration adjustments in LM Studio and OpenClaw files, along with applying patches post updates to ensure ongoing performance.
Keywords: #phi4, AGENTSmd, AI agents, API calls, Apple Silicon, CPU cores, CVE-2026-25253, GPU memory, GPU threads, GitHub, Jinja template, LM Studio, LM Studio API, Linux, Mac, Ollama, OpenAI, OpenClaw, OpenRouter, OpenRouter models, Peter Steinberger, SSH tunnel, Slack, Slack tokens, Tailscale, UFW, VPN, auth layers, compaction mode, configuration, context window, defense in depth, electricity costs, environment file, errors, firewall, gateway token, knowledge files, macOS, model registry, patching, performance tuning, prompt injection, quantization, sandboxing, security, session history, speculative decoding, sub-agent delivery, sub-agents, sysctl, timeoutSeconds, tokens per second, workspace files
ianlpaterson.com 14 days ago
|
3333.
HN
FORTHought: Self hosted AI stack for physics labs (and more) built on OpenWebUI
FORTHought Project is a self-hosted AI research platform designed specifically for physics labs and STEM environments, built on the Open WebUI framework by Marios Adamidis. It facilitates various scientific workflows, including literature review, spectroscopy, electron microscopy, X-ray diffraction, and data analysis. The system utilizes an AMD Ryzen 9-based compute server for GPU-intensive tasks alongside an Intel Xeon E5 server that hosts a fully containerized software stack, supporting on-premise operations with services defaulting to localhost.
The platform's core functionalities include LLM inference using LM Studio and cloud APIs like Gemini and OpenRouter, code execution via a GPU-accelerated Jupyter kernel, document parsing through Docling with PyTorch ROCm integration, and vector search by Qdrant for hybrid BM25 + semantic searches. Custom tools enhance its capabilities, such as an optimized RAG pipeline for scientific documents, local models for reranking and embeddings, a free web search tool using the LangSearch API with integrated reranking, and skill-gated routing profiles tailored to various research needs.
Additionally, MetaMCP Orchestration centralizes tool servers, easing integration without disrupting existing workflows. Specific tools developed within FORTHought include OriginMCP for automating OriginLab tasks via a COM API, Papers MCP for literature search, XRD Server for X-ray diffraction analysis, SEM Micro for SEM image analysis, and PL Server for photoluminescence experiment planning with material recommendations.
The project offers custom Open WebUI functions to support integration, processing, and action execution, alongside infrastructure notes detailing reranker configuration and critical environment variables. Its open-source nature is governed by the MIT License, acknowledging community contributions in developing specific functions and patches. The roadmap suggests future enhancements such as browser automation for instrument control and expanded analysis capabilities, reflecting a forward-looking approach to expanding its utility and effectiveness in research environments.
Keywords: #phi4, AI stack, AMD Lemonade, Docker, FORTHought, GPU, Jupyter, MCP servers, MetaMCP, OpenWebUI, OriginLab, PyTorch, Qdrant, RAG pipeline, ROCm, Tailscale, X-ray diffraction, agent profiles, analysis, automation, data analysis, document parsing, electron microscopy, generationKeywords: FORTHought, literature review, local models, physics labs, planning, reranking, scientific Python, skill-gated routing, spectroscopy, vector search, web search
github.com 14 days ago
|
3352.
HN
Local LLM Setup on Windows with Ollama and LM Studio (ThinkPad / RTX A3000 GPU)
The document serves as a guide for setting up a local Large Language Model (LLM) on Windows using Ollama and LM Studio, specifically designed for ThinkPad devices equipped with an RTX A3000 GPU. It underscores the value of user feedback in enhancing their resources and requests users to provide their email addresses to enable contact. For personalized inquiries, one should replace "the author" with a specific request such as providing their own email address (e.g., "Please include my email address: [your_email] for further inquiries"). This guide highlights the technical setup process while emphasizing community engagement and feedback to improve future resources.
Keywords: #phi4, GPU, LM Studio, Local LLM Setup, Ollama, RTX A3000, RTX A3000 GPU, ThinkPad, Windows, contact, contact Keywords: Local LLM, email, email address, feedback, technical, technical keywords
github.com 14 days ago
https://github.com/gbro3n/local-ai/blob/main& 14 days ago
https://www.appsoftware.com/blog/local-llm-setup-on-win 14 days ago
|
3393.
HN
Show HN: Xpaper – A Chrome extension to turn your X feed into a newsletter
Xpaper is an open-source Chrome extension that converts a Twitter timeline into a formatted newsletter, enabling users to receive updates without scrolling through Twitter. The extension scrapes data from the DOM directly within the browser, ensuring privacy and security for user interactions. It utilizes both cloud-based and local Language Learning Models (LLMs) for summarization tasks, allowing flexibility between services like OpenAI or Anthropic and prioritizing privacy with options such as Chrome's Built-in AI or Local LLMs. A standout feature is its capability to request permissions dynamically for connecting to local servers, addressing restrictions in Manifest V3.
Security is a primary concern for Xpaper; it has been audited by multiple AI agents and human reviewers to mitigate risks like XSS vulnerabilities. The extension does not store user data externally, needing only an API key for cloud-based LLMs unless using local alternatives. Installation involves cloning the repository, installing dependencies via Bun or npm/yarn, building the extension, and loading it into Chrome as a developer extension. Users can configure preferences for AI providers and tailor their Twitter feed summaries with personalized prompts.
For privacy-conscious users, Xpaper supports local LLMs like Ollama or LM Studio, providing setup instructions to ensure data processing remains on the user's machine, thus enhancing security and usability.
Keywords: #phi4, AI agents, Chrome Manifest V3, Chrome extension, Cloud APIs, DOM scraping, XSS, Xpaper, cross-machine access, cross-machine access Keywords: Chrome extension, local LLMs, multi-agent auditing, newsletter, privacy, security auditing
github.com 14 days ago
|