Scraper
Spider

About
Blog
@dbaman@fosstodon.org

Click ▶ to show/hide AI summary and keywords
Click The google logo

for Google search on keywords

2026-03-09 02:49

ollama

ollama stories from the last 14 days | Back to all stories

150. HN Ask HN: How are you handling persistent memory across local Ollama sessions

The author explores the difficulties encountered while maintaining context across local Ollama AI tool sessions, where each session begins without prior knowledge, leading to inefficiencies when handled manually. To address this, a proxy solution was developed that stores and injects recent interactions at the start of new sessions, though confidence in its architecture is limited due to the author's non-computer science background. A significant challenge remains with scoping—preventing project contexts from mixing during simultaneous work on multiple projects, currently managed through separate directories but perceived as a temporary fix rather than a robust solution. The author seeks advice on more effective methods for persistent memory and clean scoping, inquiring about potential applications of vector databases, plain files, or MCP-based systems to improve this process. Keywords: #phi4, AI tools, MCP based, Ollama sessions, Persistent memory, context retention, local storage, project separation, proxy solution, retrieval, session scoping, stateless workflow, vector DB

ollama

news.ycombinator.com 21 hours ago

216. HN Show HN: Herd – Session-affine process pool for Go

Herd is a session-affine process pool library designed for Go that efficiently manages OS subprocesses while ensuring strict session affinity in routing HTTP traffic, so each session ID consistently maps to the same subprocess. This capability allows stateful binaries, such as headless browsers or language models, to operate as multi-tenant services without requiring complex coordination layers. Herd's key features include guaranteed session-to-worker routing, auto-scaling of workers based on demand, and eviction of idle workers using TTL (Time-To-Live). Additionally, it offers health monitoring for automatic replacement of failed processes and protects against simultaneous worker spawns through singleflight acquisition. The library supports various client types with its generic pool mechanism and incorporates a built-in reverse proxy to manage session lifecycles. Installation is simplified via `go get github.com/hackstrix/herd`, and documentation provides examples like transforming Ollama serve into a multi-tenant language model gateway, ensuring dedicated processes for each user, enhancing resource management. Herd's architecture centers around core interfaces such as Worker[C], WorkerFactory[C], and Pool[C], which manage subprocess instances, spawn new workers, and route sessions respectively. Configuration options include auto-scaling bounds, idle TTL settings, polling intervals for health checks, and custom crash handlers. The library is MIT licensed, encouraging community contributions and reviews. Keywords: #phi4, Auto-Scaling, Configuration Options, Go, HTTP Traffic, Health Monitoring, Herd, License, Multi-Agent Gateway, Ollama, Pool Router, Process Pool, Reverse Proxy, Session Affinity, Singleflight Acquisition, Subprocesses, TTL Eviction, Worker Factory, Workers

ollama

github.com a day ago

356. HN FASTEST LLM decode engine on Apple Silicon. 658 tok/s on M4-Max,beats MLX by 19%

MetalRT has emerged as the leading large language model (LLM) decode engine on Apple Silicon, particularly excelling on the M4 Max chip with a remarkable speed of 658 tokens per second. This performance surpasses the MLX framework by 19% and is notably faster than alternative engines like uzu, llama.cpp, and Ollama. The evaluation involved four quantized models—Qwen3-0.6B, Qwen3-4B, Llama-3.2-3B, and LFM2.5-1.2B—operating on an Apple M4 Max with 64 GB of RAM under macOS 26.3. MetalRT achieved superior performance in three out of four models tested, demonstrating a speed increase ranging from 1.10x to 2.40x over mlx-lm and llama.cpp respectively. It recorded its fastest response at 6.6 milliseconds for the first token of the Qwen3-0.6B model. Although uzu exhibited superior performance on Llama-3.2-3B, MetalRT consistently maintained higher decode speeds across models, positioning it as optimal for fast-response applications like chat interfaces and voice systems. The benchmark ensured fairness by using identical model files for MetalRT and mlx-lm; however, llama.cpp and Ollama used GGUF files with additional REST API overhead. Despite these differences, the output quality remained consistent across all engines, highlighting that performance variations were purely in terms of speed. Keywords: #phi4, 4-bit quantized, Apple Silicon, LLM, M4 Max, MLX, MetalRT, Ollama, REST API, benchmarking, chat apps, decode engine, inference framework, llamacpp, macOS, privacy-first apps, speedup, throughput, time-to-first-token, tokens per second

ollama

www.runanywhere.ai 2 days ago

964. HN Show HN: Teaching Tokens: Implementing Private, Lightweight AI in the Classroom

"Show HN: Teaching Tokens" presents an innovative app designed for classroom use, aimed at facilitating the teaching of AI fundamentals through private, lightweight AI applications. The app streamlines the educational process by enabling educators to install an Ollama Docker container, pull a large language model with 1 billion parameters, and initiate a web-based chat interface for interactive learning experiences. This setup allows for one-click deployment of various other models, enhancing flexibility in teaching diverse AI concepts. Additionally, a lesson plan is provided on GitHub specifically tailored for educators using Kali Linux, ensuring structured guidance. The overarching goal of this app is to democratize AI education by making it more accessible and engaging through interactive and manageable technological tools. Keywords: #phi4, 1B Parameter model, App, Chat, Classroom, Deploy, Deploy models, Docker, GitHub, Image, Image view Keywords: Teaching Tokens, Interface, Kali, LLM, Lesson, Lesson plan, Model, Models, Ollama, Ollama Docker Container, One-click, One-click deploy, Parameters, Plan, Private AI, Script, Setup script, Teaching Tokens, View, WebUI, WebUI chat interface

ollama

medium.com 4 days ago

981. HN Show HN: AuraText – Like Grammarly for AI prompts, works in every Windows app

AuraText is a free, floating overlay application designed for Windows to enhance AI prompt optimization across various platforms such as Notion, VS Code, Slack, and Word. It refines vague prompts using established frameworks like RISEN, COSTAR, and RTF, significantly improving the quality of AI-generated outputs. The app includes an AI router that intelligently selects the most appropriate model for different tasks—Claude for analytical purposes, GPT-4 for creative tasks, and Gemini for research-related activities. Users also have the flexibility to integrate their own API keys from a range of providers, including local Ollama services. Developed independently over four months by a solo developer, AuraText has already achieved significant traction with over 1,000 downloads during its beta phase. The app is poised to introduce several key features, such as a Trust Layer for verifying AI outputs, a Skill Dashboard to monitor and enhance prompt quality, and a Learning Mode designed to improve users' interaction skills with AI tools. Its universal integration capability on Windows facilitates smooth transitions between applications without needing the Alt-Tab function, further supported by Smart Cursor Lock for efficient text insertion. These features collectively position AuraText as an innovative tool in optimizing AI interactions across different work environments. Keywords: #phi4, AI models, AI prompts, API keys, AuraText, COSTAR, Learning Mode, Ollama, RISEN, RTF, Skill Dashboard, Smart Cursor Lock, Trust Layer, Universal integration, Windows app, overlay

ollama

auratxt.com 4 days ago

1212. HN Show HN: I built an AI data analyst that never sees your data

QueryVeil is an innovative AI-powered data analysis tool designed to function entirely within the browser, ensuring user data privacy by leveraging schema information—such as column names and types—instead of actual data. This approach facilitates generating SQL queries using DuckDB WebAssembly locally, thus avoiding the transfer of sensitive data to external servers. The system comprises three main layers: a local data engine, schema extraction, and AI-driven query generation that can operate both on the cloud or locally. The development of QueryVeil was driven by the author's experience as a data analyst, where rapid querying often clashed with data privacy concerns. While tools like ChatGPT accelerate analysis, they pose privacy risks due to their reliance on sending data to external servers. By focusing solely on schema information, QueryVeil offers a secure and efficient solution for data analysis. The architecture of QueryVeil involves extracting metadata from files without uploading them, allowing AI models—either local or cloud-based—to generate SQL queries that are processed within the browser. The tool incorporates enhancements such as handling complex queries via a LangGraph agent for multi-step analysis, managing performance limits with clear error messaging, and enabling verifiability of data claims through browser DevTools. For users prioritizing stringent privacy controls, QueryVeil provides local AI options like WebLLM and Ollama to keep the entire process isolated. The tool supports various file formats including CSVs, Excel, Parquet, and JSON files, with plans to expand its capabilities to connect with remote databases while adhering to schema-only analysis principles. Ultimately, QueryVeil aims to harmonize speed and safety in data analysis tools, empowering users to verify privacy claims through browser tools. Its flexible architecture allows for seamless switching between local and cloud AI resources, ensuring both efficiency and security in data handling. Keywords: #phi4, AI data analyst, DuckDB WebAssembly, LangGraph agent, Ollama, SQL generation, WebLLM, browser-based, cloud AI, local processing, multi-step queries, privacy, schema analysis

ollama

www.queryveil.com 5 days ago
https://app.queryveil.com/demo 5 days ago

1599. HN Show HN: Timber – Ollama for classical ML models, 336x faster than Python

Timber is a specialized tool designed to enhance the performance of classical machine learning models during inference, significantly increasing prediction speed by up to 336 times compared to Python-based XGBoost single-sample predictions. It achieves this efficiency by compiling models into native C binaries and serving them through a local HTTP API, thereby eliminating the need for a Python runtime during inference and achieving sub-microsecond latency. Timber is particularly suited for teams that require rapid, predictable, and portable model inference such as those in fraud/risk detection, edge/IoT deployments, regulated industries needing deterministic artifacts, and platform/infrastructure teams looking to minimize Python overhead through native binaries. The tool supports models from various frameworks, including XGBoost, LightGBM, scikit-learn, CatBoost, and ONNX. It offers a streamlined setup process with a simple load-and-serve workflow and a minimalistic API for model serving and health checks. Users can quickly get started by installing the compiler via pip, loading supported models using Timber's command-line interface, and serving them locally to make prediction requests. Timber supports multiple formats: JSON and text for XGBoost and LightGBM, pickle format for scikit-learn, ONNX (ML opset TreeEnsemble) for tree ensemble operators, and JSON exports for CatBoost. Benchmarks conducted on an Apple M2 Pro with 16 GB RAM using the breast_cancer dataset from sklearn demonstrated Timber's superior performance in in-process latency when compared to Python XGBoost, excluding network round-trip time. However, Timber does have certain limitations; ONNX support is confined to tree ensemble operators, CatBoost requires JSON exports, and scikit-learn parsing may struggle with uncommon custom estimators. The development roadmap for Timber includes expanding framework compatibility, supporting a broader range of ONNX operators, enhancing embedded deployment profiles, providing richer benchmarks, and improving tools for regulatory compliance. The project encourages community contributions with guidelines available in its repository and operates under an Apache-2.0 license. For those interested in more detailed insights into Timber's methodology and applications, a technical paper is provided as further reading. Keywords: #phi4, ARM Cortex-M, Apache-20 license, Apache-20 licenseComma-separated List: Timber, Apache-20 licenseExtracted Keywords: Timber, Apache-20 licenseFinal Keywords: Timber, Apache-20 licenseKeywords: Timber, CatBoost, HTTP API, LightGBM, MISRA-C, ML models, ONNX, Ollama, Python runtime, RISC-V, Timber, XGBoost, audit trails, benchmarks, deterministic artifacts, edge/IoT, inference, latency, microsecond latency, model-serving, native C, scikit-learn

ollama

github.com 7 days ago
https://gist.github.com/msteiner-google/5f03534b0df58d3 6 days ago

1627. HN Show HN: A local AI news aggregator built with Vue 3, FastAPI, and Ollama

The article presents a newly developed local AI news aggregator created with technologies including Vue 3, FastAPI, and Ollama. The developers have extended an invitation to users for feedback, highlighting the significance of user input in refining and enhancing the application's development process. To foster communication and gather insights or suggestions effectively, they are also seeking contact details via email from interested parties. This approach underscores their commitment to evolving the tool based on community engagement and constructive criticism. Keywords: #phi4, AI, FastAPI, Ollama, Show HN, Vue 3, built, contact, contact Keywords: Show HN, email, email address, feedback, input, local, news aggregator

ollama

github.com 7 days ago

1808. HN Show HN: Chatlite – simple Ollama desktop chat app under 5 MB

Chatlite is a streamlined desktop chat application designed to provide simplicity and efficiency with minimal resource consumption. Developed as an alternative to more complex or web-dependent interfaces, it facilitates seamless interaction with models through its lightweight design. Key features include its compact size of under 5 MB and low memory usage, achieved by using Tauri for native desktop functionality. It offers secure local encrypted chats with password protection while prioritizing a keyboard-first user experience to enhance usability. Available on GitHub, Chatlite invites users to provide feedback aimed at improving integration with local Large Language Model workflows. For further contact or suggestions, the developer has made an email address available, though it is omitted here for privacy and security reasons. Keywords: #phi4, Chatlite, GitHub repository, LLM workflows, Ollama, Tauri, desktop app, encrypted chats, feedback, keyboard-first UX, low memory footprint, native desktop, password lock, small app size

ollama

github.com 8 days ago

1848. HN Show HN: Externalizing Developers' Intuition as Code

Dev Sentinel is an innovative tool designed to enhance engineers' problem-solving abilities by transforming coding challenges into structured knowledge. It operates during developers' coding sessions with Claude Code, identifying moments of difficulty without modifying the prompts. This process generates memories from raw failures that are refined and validated, helping users connect these experiences across different contexts to uncover root causes of issues. To utilize Dev Sentinel, one must clone its repository and install dependencies using npm, along with setting up Ollama locally or AWS Bedrock for optimal functionality. The setup requires initializing the tool within a project directory to establish necessary hooks and configuration files. Once installed, users can access a local web dashboard that displays captured experiences and emerging patterns. Struggles identified during coding sessions are reviewed and confirmed, subsequently summarized into stored knowledge, which aids in refining problem-solving skills over time. Dev Sentinel provides extensive documentation through markdown files covering commands, settings, and usage examples to assist users in maximizing its benefits. The tool is open-source under the MIT license, while its default models (Qwen3) operate under the Apache 2.0 license provided by Alibaba Cloud. Keywords: #phi4, AWS Bedrock, Apache 20 License, Claude Code, Dev Sentinel, Engineering Intuition, Experience Generation, Git Clone, Knowledge Reuse, Ollama, Pattern Connection, Qwen3 Models, Struggle Equity, npm Install

ollama

github.com 8 days ago

1898. HN Show HN: H-CLI – Manage network infrastructure with natural language

H-CLI is an advanced Telegram bot created to facilitate network infrastructure management through natural language commands, developed by a network engineer with expertise in parallel SSH tooling across various vendors. The bot integrates artificial intelligence models like Claude Code or self-hosted alternatives to interpret and execute tasks such as discovering CLOS fabrics and deploying EVE-NG labs via plain English instructions. Key features include the ability to perform parallel REST API calls, automate lab deployments with EVE-NG, render Grafana dashboards directly through Telegram, and possess teachable skills that allow it to learn from user interactions. It employs memory systems that utilize chunk-based conversations and vector memory for retaining long-term knowledge. The bot prioritizes safety by employing a layered security model akin to Asimov's Laws of Robotics. This involves using two distinct AI models: one responsible for executing commands and another acting as an independent judge to ensure command safety. H-CLI’s infrastructure is supported by Docker Compose with nine containers, incorporates pattern denylists, network isolation, non-root privileges, and HMAC-signed results to enhance security. For deployment, users need to configure the system similarly to setting up a monitoring tool, employing read-only credentials and ensuring restricted access. Additional functionalities include vector memory for semantic search of past interactions, performance metrics through a monitor stack, backup options, and data export capabilities essential for training models. As an open-source project under the MIT license, H-CLI is tailored for engineers seeking adaptable tools that improve with interaction over time. Keywords: #phi4, AI agent teams, AI brain, Asimov firewall, Claude Code, Docker Compose, Docker networks, EVE-NG lab, Grafana dashboard, HMAC-signed results, NetBox, Network infrastructure, Ollama, Qdrant database, REST API calls, Redis scaling, Telegram bot, TimescaleDB, audit trail, backup & sync, chunk-based conversation memory, horizontal scaling, log4AI logger, natural language, network automation, non-root containers, parallel SSH, pattern denylist, security hardening, semantic analysis, shell command logger, vLLM, vector memory

ollama

github.com 8 days ago

1962. HN Externalizing Developers' Intuition as Code

Dev Sentinel is a sophisticated tool engineered to externalize developers' intuition into structured knowledge through monitoring Claude Code sessions. It identifies moments of developer frustration and transforms these challenges into reusable lessons without altering user inputs. The core functionality involves detecting when struggles occur, refining these incidents into concrete learning points, validating them to prevent repetition, and linking patterns across various scenarios to uncover root causes, thereby enhancing engineering intuition cumulatively. To set up Dev Sentinel, users must clone its GitHub repository and navigate into the project directory where they can utilize npm commands for installation. The tool relies on Ollama for model management and offers optional integration with AWS Bedrock to boost performance capabilities. Users need to initialize hooks within their project directories to capture specific session events effectively. Once operational, Dev Sentinel allows users to review drafts of captured struggles through a local dashboard or command-line tools, enabling them to confirm and store these experiences as structured lessons. The tool is distributed under the MIT license, ensuring open-source accessibility, while its default models are governed by Alibaba Cloud's Apache 2.0 license. Comprehensive setup instructions and configuration details can be found in documentation files such as COMMANDS.md, SETTINGS.md, and EXAMPLE.md, guiding users through the entire process efficiently. Keywords: #phi4, AWS Bedrock, Code, Commands, Dashboard, Dev Sentinel, Drafts, Engineering Intuition, Experience Generation, Externalizing Intuition, Frustration Detection, Git Clone, Hooks, License, Models, Ollama, Review, Settings, Structured Memory, Struggle Equity, npm Install

ollama

github.com 9 days ago

2296. HN I built a 151k-node GraphRAG swarm that autonomously invents SDG solutions

The "PROMETHEUS AGI" project is an innovative initiative aimed at advancing beyond conventional language model applications by employing a sophisticated autonomous 151k-node GraphRAG swarm. Its primary objective is to facilitate cross-domain reasoning in order to propose novel solutions for the United Nations Sustainable Development Goals (SDGs). The project harnesses Neo4j Aura for organizing data and incorporates patent information through Google BigQuery and OpenAlex API, while utilizing Ollama's Llama 3 for entity extraction and Claude 3.5 for comprehensive reasoning processes. A key feature of this system is its ability to identify "Missing Links" by mapping existing problems with available technologies across various domains, subsequently generating concept blueprints for innovative solutions that are not yet patented. One such example is Project HYDRA, a zero-power water purifier. To date, over 261 blueprints have been created as part of this initiative. The project seeks engagement from domain experts to validate these AI-generated ideas and assist in developing prototypes. It also aims to secure funding to expand its graph database beyond one million nodes. Feedback is sought on various aspects such as architecture, the Neo4j schema, and the multi-agent approach employed by PROMETHEUS AGI. The user interface comprises a Streamlit-based digital twin dashboard and a React/Vite landing page, which facilitate interaction with the project's outputs. Links to explore these resources are provided: [Project Prometheus Dashboard](https://project-prometheus-5mqgfvovduuufpp2hypxqo.streamlit.app/) and [PROMETHEUS AGI Landing Page](https://prometheus-agi.tech). Keywords: #phi4, Claude 35, Google BigQuery, GraphRAG, LLM/RAG, Missing Links, Neo4j Aura, Ollama, OpenAlex API, PROMETHEUS AGI, Project HYDRA, React/Vite, Streamlit, UN SDGs, biofouling, concept blueprints, cross-domain reasoning, digital twin dashboard, domain experts, materials science, multi-agent approach, nanobiology

ollama

news.ycombinator.com 10 days ago

2532. HN An AI agent on an ESP32 that can automate sensors, relais, speak NATS, Telegram

The AI agent described is an innovative product designed for the ESP32 microcontroller, providing comprehensive automation at a cost below $5 without requiring external systems like Raspberry Pi or Home Assistant. This standalone solution emphasizes persistent local automation through a rule engine that manages sensor-triggered actions and complex sequences. It supports multi-channel control via interfaces such as Telegram, USB serial, and NATS, enhancing its versatility. The agent features an advanced tool loop allowing the AI to engage in iterative reasoning and execution of up to 20 tools, adapting from outcomes without depending on large language models or cloud services. User preferences are retained through persistent memory stored in flash storage, ensuring continuity after reboots. The device operates independently of network connections by supporting local execution frameworks like OpenRouter, Ollama, or llama.cpp, negating the need for internet access or API keys. Additional functionalities include a web-based configuration interface accessible from any browser, allowing users to adjust prompts, memory settings, and configurations without needing to reflash the device. It also offers serial bridge capabilities, enabling seamless interaction with other serial devices such as Arduinos and various sensors via UART, enhancing its utility in diverse automation scenarios. Keywords: #phi4, AI agent, ESP32, NATS, Ollama, OpenRouter, Telegram, UART, agentic tool loop, automation, bare metal, edge-triggered, llamacpp, multi-channel control, persistent memory, relais, rule engine, sensors, serial bridge, web config

ollama

wireclaw.io 11 days ago

2600. HN BrAIn: Persistent, Human-Inspired Memory for LLM Agents

brAIn is an innovative tool developed to enhance LLM-based agents by providing a persistent, human-inspired memory system, addressing the limitations of stateless agents that restart each interaction without context. This tool employs a structured memory model inspired by neuroscience, consisting of five types: working, episodic, semantic, procedural, and contact memories. These are all stored in a portable `.brain` file, allowing for comprehensive data retention and recall. brAIn's key features include an automatic memory consolidation process akin to a "sleep cycle," where memories undergo decay, promotion, or emergence, enhancing the agent's ability to manage information efficiently over time. Additionally, it offers semantic similarity search capabilities through embeddings, facilitating better understanding and retrieval of related concepts. Compatibility with various AI editors such as Cursor, Claude Code, Codex, and Gemini CLI broadens its applicability across different platforms. For optimal functionality, brAIn requires certain prerequisites: CGO for the SQLite driver and an LLM connection via Ollama for specific operations, though read-only tasks do not necessitate Ollama. The tool is installed using `make install` and operates from a singular data directory located in the user's home, housing essential files like the primary brain store (`agent.brain`), configuration settings, logs, and backup records. A background daemon ensures continuous memory processing by handling tasks such as automatic consolidation and updates without user intervention. Comprehensive documentation is available to guide users through configuring brAIn, managing the daemon, and executing tests or example scenarios, ensuring effective utilization of this advanced memory-enhancing tool. Keywords: #phi4, AI editor, LLM agents, Ollama, SQLite, automatic processing, backups, brAIn, brain-daemon, configuration, consolidation, contact memory, daemon, embeddings, emergent skills, environment variables, episodic memory, integration, logs, persistent memory, procedural memory, semantic memory, sleep cycle, working memory

ollama

github.com 11 days ago

2734. HN Show HN: Courtyard – Open-source macOS app for local MLX fine-tuning Text

Courtyard is an open-source macOS application developed to streamline local machine learning (ML) processes on Apple Silicon devices using the mlx-lm framework. It offers a user-friendly interface that eliminates the need for complex scripting and reliance on cloud services by facilitating dataset preparation, LoRA fine-tuning, and model testing. The app supports importing various text document formats for data preparation with features like auto-cleaning and AI-powered dataset generation. Users can choose from multiple base models to configure LoRA parameters efficiently, leveraging Apple's MLX framework for training. Additionally, it includes an integrated chat interface to test the quality of trained models and provides a seamless one-click export function to Ollama with quantization options. The technical foundation of Courtyard involves using Tauri 2.x, React, Rust, and Python, which collectively ensure a smooth desktop experience on macOS devices. As a fully open-source project under the AGPL license, it invites community contributions through GitHub, offering support via Discord and other platforms. The application enhances user experience with features such as unified progress tracking across tasks, notifications for task completion, and mechanisms to prevent sleep during long-running processes. Installation can be done by downloading a pre-built app or building from source using tools like Node.js, Rust, and pnpm. Courtyard supports both English and Chinese interfaces, catering to a broader user base. For those interested in commercial use or alternative licensing options, contact is available through the project's channels. Keywords: #phi4, AGPL license, Apple Silicon, Courtyard, GGUF export, GUI wrapper, GitHub repository, LoRA, MLX, Ollama, PDF/DOCX support, Python, React, Rust, Tauri, UX design, chat UI, data cleaning, dataset preparation, deduplication, environment management, fine-tuning, i18n, local AI model, local processing, macOS, no cloud dependency, privacy filtering, quantization, sleep prevention, smart prompt language matching, training visualization, zero-config setup

ollama

github.com 12 days ago

2748. HN Show HN: We Gave an LLM Adventure Engine a Body, Now It Feels Exhausted

BoneAmanita is an innovative open-source project designed as both a text-adventure engine and a philosophical companion, which addresses the challenge of integrating Large Language Models (LLMs) with simulated physical attributes through its dual-component structure: The Brain and The Body. The Brain, utilizing a custom 3B parameter GGUF model, generates atmospheric and philosophical content by steering clear of typical Reinforcement Learning from Human Feedback (RLHF) biases. In contrast, The Body is a Python-based engine that simulates biological processes such as metabolism through variables like ATP for stamina, ROS for trauma, and Cortisol levels. This simulation impacts the LLM's responses, making them more realistic in scenarios involving stress or clarity depending on its metabolic condition. To enforce adherence to physical reality, BoneAmanita employs an internal persona named "Gordon," which preempts the processing of implausible actions by the model. It operates across four distinct modes: Adventure mode with strict physics rules, Conversation mode allowing philosophical dialogue without inventory restrictions, Creative mode for unbound exploration, and Technical mode for debugging purposes. This system is fully open-source and accessible at no cost, requiring only Python and Ollama to run. Users can engage in a text adventure that realistically reflects physical states and reactions by following straightforward commands to download and execute the software. Keywords: #phi4, Adventure mode, BoneAmanita, Conversation mode, Cortisol, Creative mode, GGUF model, Gordon Shock, LLM, Ollama, Python, Technical mode, The Unlicense, interactive fiction, inventory hallucination, metabolic state, open-source

ollama

mycelialmirror.medium.com 12 days ago

2801. HN Show HN: Writher – offline voice assistant for Windows (Whisper and Ollama)

Writher is a voice assistant tailored for Windows users that enhances productivity through offline hands-free text dictation and management of notes, appointments, and reminders. It functions without internet access after initial setup, utilizing local resources such as faster-whisper for speech recognition and Ollama for intent parsing, with data securely stored in a SQLite database. Users can switch between Dictation Mode, where holding the AltGr key enables text dictation into any application, and Assistant Mode, activated by Ctrl+R to execute natural language commands like saving notes or scheduling appointments. Additionally, Writher offers Windows notifications and an animated floating widget that reflects different states of use. The assistant supports multiple languages initially in English and Italian, with provisions for adding more through simple modifications in the locales.py file. It requires a Windows 10/11 environment and Python version 3.11 or higher for installation, along with Ollama installed locally to enable full assistant functionality. A working microphone is necessary for voice input, and setup involves cloning the repository, creating a virtual environment, installing dependencies, and configuring settings in `config.py`. In terms of usage, dictation is straightforward: users focus on any text field, hold AltGr, dictate their message, and release to paste the transcribed text. Assistant commands allow for more complex interactions like saving notes or scheduling appointments via voice. The project's structure includes multiple Python files dedicated to various functions such as transcription, database management, and GUI elements, with troubleshooting tips available for common issues like AltGr detection errors, Ollama connectivity problems, microphone detection difficulties, and text pasting challenges. Writher is distributed under the MIT License, offering an open-source solution for enhancing productivity through voice interaction. Keywords: #phi4, CUDA acceleration, Ollama, Python, SQLite database, Whisper model, Windows, Writher, appointments scheduling, clipboard injection, dictation, faster-whisper, floating widget, hotkeys, installation guide, language support, microphone, multi-language, natural-language commands, notes management, notifications, offline, productivity, real-time dictation, reminders setting, system tray, toast notifications, troubleshooting, virtual environment, voice assistant

ollama

github.com 12 days ago

2873. HN Show HN: OpenPDB – Generate AI agents with real personalities

OpenPDB is an innovative open-source tool designed to create AI agents imbued with unique personalities derived from a comprehensive database of over 12,000 characters. Each agent exhibits distinct voices, values, and worldviews, enabling them to offer strategic and character-driven responses rather than generic outputs. Users can interact with these personalized agents through platforms like Ollama or via a Google Colab demo without the need for installation. The tool supports multi-agent collaboration using OpenGoat, demonstrated in scenarios such as resolving startup crises where agents assume different roles to decide between pivoting or seeking investment. Setting up OpenPDB is straightforward; users clone the repository, download character data, and generate agents with Python scripts. These AI personalities are shaped by psychological frameworks like MBTI and Enneagram types, along with Wikipedia context for each character, allowing them to deliver dialogues that reflect their unique personas—for instance, Batman providing strategic insights or the Joker offering a chaotic perspective on cryptocurrency. OpenPDB emphasizes its practical utility over promotional claims by showcasing real outputs where personality-infused AI delivers more nuanced perspectives. The quality of responses varies depending on model size, with larger models performing better than those like qwen3-coder, which scores around 7/10 in effectiveness. Additionally, well-known characters tend to perform more effectively compared to lesser-known ones. This project operates using Python 3.9+ and Ollama, with optional internet access for accessing Wikipedia context data. Keywords: #phi4, AI agents, Batman, GitHub, Joker, MIT License, Ollama, OpenGoat, OpenPDB, Python 39+, SOULmd, characters, collaboration, multi-agent, personalities, qwen3-coder, values, voices

ollama

github.com 12 days ago

2927. HN Show HN: WebPerceptor – Enabling AI Mediated Web Browsing

WebPerceptor is a client-side Chromium plugin that leverages Large Language Models (LLMs) to modify web content in real-time as users browse the internet. It eliminates the need for manual interaction by automatically rewriting or appending text on web pages using user-defined prompts sent to LLMs, which then seamlessly integrate changes back into the webpage. The plugin offers AI-mediated browsing that personalizes content modifications through either local models like Ollama or cloud-based ones such as OpenAI. Users can customize their experience by specifying domains, URLs, or HTML elements for processing and setting site-specific functionalities. The application has diverse use cases, including adapting text to suit different reading levels and styles, altering tone and sentiment, real-time fact-checking, highlighting bias, representing varied perspectives, ensuring safer browsing for vulnerable groups, and filtering content based on user interests. However, it also presents potential risks such as introducing biases or censorship by rewriting content to fit particular viewpoints, leading to information disorder, and amplifying false narratives or extremist views. WebPerceptor's technical foundation includes HTML, CSS, JavaScript, Ollama, and Node.js, making it compatible with any Chromium browser but primarily tested on Google Chrome. Installation involves setting up the plugin, configuring LLM models, and running necessary software locally if opting for local model use. The tool is open to contributions and collaborations in research and has been cited in various academic studies, reflecting its innovative approach to enhancing web browsing by using AI while highlighting challenges related to content integrity and bias. Keywords: #phi4, AI Mediated Web Browsing, API Key, Browser Extension, Chromium Plugin, Citation, Client-Side, Cloud-Based LLM, Content Filtering, Customization, Featured Research, LLMs, Local LLM, Model Recommendations, Nodejs, Ollama, Privacy, Real-Time Modification, Research Collaborations, Text Rewriting, WebPerceptor

ollama

github.com 12 days ago

2935. HN Show HN: llm_grep - LLM-Powered Alternative to Grep

The `llm_grep` tool is an innovative command-line utility designed to enhance traditional text processing by integrating large language models (LLMs) into its functionality, offering a more intelligent alternative to conventional tools such as grep. It facilitates complex tasks like filtering, transforming, and analyzing text outputs from commands using LLMs. For instance, it can parse logs for errors, convert JSON data into tables, or analyze process information through natural language processing capabilities. To install `llm_grep`, users require the `uv` package manager and should execute the command `uv tool install .` globally to set up the basic version. Additionally, if there is a need to utilize AWS Bedrock support, installation can be customized with the command `uv tool install " .[bedrock] "`. In terms of usage, users pipe their command outputs into `llm_grep`, such as using `cat logs.txt | llm_grep "find errors"` for log analysis. The tool provides flexibility in updating or uninstalling through respective `uv` commands with a `--force` option when necessary. Configuration options allow users to set default parameters, including the model `bedrock/global.amazon.nova-2-lite-v1:0`, and other persistent settings like temperature and AWS credentials via the configuration file located at `~/.config/llm_grep/config.yaml`. Model selection can be overridden using command-line flags, environment variables, or directly through this config file. `llm_grep` supports various model providers such as AWS Bedrock, OpenAI, Anthropic, and local models like Ollama that do not require API keys. The installation for local models involves pulling a specific model to use it within the tool. The utility offers several usage flags: `-m` or `--model` to specify an LLM model, `-t` or `--temperature` to set sampling temperature (defaulting to 0.0), `-s` or `--system-prompt` for system prompt customization, `-c` or `--config` to define a configuration file path, and `-V` or `--version` to display the version of `llm_grep`. Examples provided demonstrate practical applications such as extracting error lines from logs using a chosen model. Keywords: #phi4, API keys, CLI flag, JSON transformation, LLM-Powered, Ollama, analysis, cloud providers, command output, config file, configuration, credential resolution, default model, environment variables, error extraction, grep alternative, installation, intelligent filtering, litellm, local models, max tokens, model selection, persistent settings, process analysis, provider-specific options, sampling, system prompt, temperature, text transformation, usage

ollama

github.com 12 days ago

3183. HN Termux Commands – Quick Reference – Phone Hacks

This document serves as a comprehensive guide for leveraging Termux, an Android terminal emulator, to transform your smartphone into a potent tool for programming and hacking endeavors. It outlines steps to install and run local AI models like Ollama, specifically highlighting commands to execute Llama 3.2 models if they are available, along with instructions for setting up `llama.cpp` by installing dependencies such as `clang`, `cmake`, and `git`, then cloning and building the repository for enhanced functionality. For image viewing within Termux, it suggests tools like `tiv`, `viu`, `termimage`, `termux-api` (to view images in a gallery), and `libcaca` (for ASCII art). It also describes commands to fetch random images from Unsplash, generate placeholders, create QR codes, display ASCII fire effects, or obtain avatars via Robohash. The document details how to install gaming packages such as `bsdgames`, `moon-buggy`, `nethack`, and `ninvaders` for in-terminal gameplay. Moreover, it includes a list of tools like `reddix` (for Reddit browsing), `cmatrix` (Matrix rain effect), `sl` (train animation), `figlet` (ASCII art text), and `hollywood` (fake hacker UI) to enhance social interaction or visual effects. It also suggests using `tmux` to create a multi-pane terminal interface for multitasking, such as checking weather via wttr.in or pinging Google DNS. A crucial note mentions that running suggested AI models like Llama 3.2, Gemma 2, and Phi-3.5 Mini might cause the phone to overheat. Overall, this guide provides essential information on enhancing a smartphone’s functionality through Termux for diverse applications in programming, gaming, and visual effects. Keywords: #phi4, AI, ASCII art, Ollama, QR code, Reddit, Termux, commands, dashboard, games, image viewing, llamacpp, social effects, tmux

ollama

news.ycombinator.com 13 days ago

3296. HN Building a (Bad) Local AI Coding Agent Harness from Scratch

The document outlines the development of a local AI coding agent harness using approximately 400 lines of Node.js, designed to operate on a local GPU without cloud dependencies or npm packages. Built with Claude Sonnet 4.6 and Google Gemma 3 4GB via Ollama on a Lenovo ThinkPad P1 Gen 4, the project demonstrates key concepts such as maintaining user-agent interaction history through an Agent Loop, enabling Tool Use via a text-based protocol for LLM action prompts, and implementing Sandboxing to restrict file access within a specified directory. It processes commands like reading, writing, or listing files by interpreting regular expressions from model outputs. Key components of the harness include the Ollama Client, which interfaces with the local AI model using HTTP requests to stream responses token-by-token, and the Agent Loop, which manages interaction history between user input and model output. The Tool Use Protocol employs a text convention for guiding LLM in executing file operations through tagged code blocks, while Sandboxing ensures that all file paths are resolved relative to a designated working directory to prevent unauthorized access. The project highlights that tool use does not necessitate structured APIs but can be effectively managed with a well-crafted text protocol and parser, allowing the model to autonomously perform tasks within safety constraints. Despite some challenges, such as generating incorrect JavaScript code, this approach illustrates how local AI models can provide practical coding assistance, focusing on flexibility, safety, and self-correction capabilities in tool use. Keywords: #phi4, Agent Loop, Claude Sonnet, Coding Agent, Conversation History, ESM Imports, Error Handling, File-Command Protocol, GPU, Google Gemma, HTTP API, JavaScript, Local AI, Model Access, NDJSON, Nodejs, Ollama, Packagejson, Sandboxing, Terminal-based, Tool Use

ollama

ScraperSpider

Scraper
Spider