| # Palmyra-mini-thinking-b GGUF Model Import Guide for Ollama | |
| This guide provides step-by-step instructions for importing the Palmyra-mini-thinking-b GGUF model files into Ollama for local inference. | |
| ## π Available Model Files | |
| This directory contains two quantized versions of the Palmyra-mini-thinking-b model: | |
| - `Palmyra-mini-thinking-b-BF16.gguf` - BFloat16 precision (highest quality, largest size) | |
| - `Palmyra-mini-thinking-b-Q8_0.gguf` - 8-bit quantization (high quality, medium size) | |
| ## π§ Prerequisites | |
| Before getting started, ensure you have: | |
| - **Ollama installed** on your system ([Download from ollama.com](https://ollama.com/)) | |
| - **Sufficient RAM/VRAM** for your chosen model: | |
| - BF16: ~16GB+ RAM recommended | |
| - Q8_0: ~8GB+ RAM recommended | |
| - **Terminal/Command Line access** | |
| ## π Quick Start Guide | |
| ### Method 1: Import Local GGUF File (Recommended) | |
| #### Step 1: Navigate to Model Directory | |
| ```bash | |
| cd "/Users/[user]/Documents/Model Weights/SPW2 Mini Launch/Palmyra-mini-thinking-b/GGUF/Palmyra-mini-thinking-b FIXED GGUF-BF16" | |
| ``` | |
| #### Step 2: Create a Modelfile | |
| Create a new file named `Modelfile` (no extension) with the following content: | |
| **For BF16 version (highest quality):** | |
| ``` | |
| FROM ./Palmyra-mini-thinking-b-BF16.gguf | |
| PARAMETER temperature 0.3 | |
| PARAMETER num_ctx 4096 | |
| PARAMETER top_k 40 | |
| PARAMETER top_p 0.95 | |
| SYSTEM "You are Palmyra, an advanced AI assistant created by Writer. You are helpful and honest. You provide accurate and detailed responses while being concise and clear." | |
| ``` | |
| **For Q8_0 version (balanced):** | |
| ``` | |
| FROM ./Palmyra-mini-thinking-b-Q8_0.gguf | |
| PARAMETER temperature 0.3 | |
| PARAMETER num_ctx 4096 | |
| PARAMETER top_k 40 | |
| PARAMETER top_p 0.95 | |
| SYSTEM "You are Palmyra, an advanced AI assistant created by Writer. You are helpful and honest. You provide accurate and detailed responses while being concise and clear." | |
| ``` | |
| #### Step 3: Import the Model | |
| ```bash | |
| ollama create Palmyra-mini-thinking-b -f Modelfile | |
| ``` | |
| #### Step 4: Run the Model | |
| ```bash | |
| ollama run Palmyra-mini-thinking-b | |
| ``` | |
| ### Method 2: Using Absolute Paths | |
| If you prefer to create the Modelfile elsewhere, use absolute paths: | |
| ``` | |
| FROM "/Users/thomas/Documents/Model Weights/SPW2 Mini Launch/Palmyra-mini-thinking-b/GGUF/Palmyra-mini-thinking-b FIXED GGUF-BF16/Palmyra-mini-thinking-b-BF16.gguf" | |
| PARAMETER temperature 0.3 | |
| PARAMETER num_ctx 4096 | |
| SYSTEM "You are Palmyra, an advanced AI assistant created by Writer." | |
| ``` | |
| Then create and run: | |
| ```bash | |
| ollama create Palmyra-mini-thinking-b -f /path/to/your/Modelfile | |
| ollama run Palmyra-mini-thinking-b | |
| ``` | |
| ## βοΈ Advanced Configuration | |
| ### Custom Modelfile Parameters | |
| You can customize the model behavior by modifying these parameters in your Modelfile: | |
| ``` | |
| FROM ./Palmyra-mini-thinking-b-BF16.gguf | |
| # Sampling parameters | |
| PARAMETER temperature 0.3 # Creativity (0.1-2.0) | |
| PARAMETER top_k 40 # Top-k sampling (1-100) | |
| PARAMETER top_p 0.95 # Top-p sampling (0.1-1.0) | |
| PARAMETER repeat_penalty 1.1 # Repetition penalty (0.8-1.5) | |
| PARAMETER num_ctx 4096 # Context window size | |
| PARAMETER num_predict 512 # Max tokens to generate | |
| # Stop sequences | |
| PARAMETER stop "<|end|>" | |
| PARAMETER stop "<|endoftext|>" | |
| # System message | |
| SYSTEM """You are Palmyra, an advanced AI assistant created by Writer. | |
| You are helpful, harmless, and honest. You provide accurate and detailed | |
| responses while being concise and clear. You can assist with a wide range | |
| of tasks including writing, analysis, coding, and general questions.""" | |
| ``` | |
| ### Parameter Explanations | |
| - **temperature**: Controls randomness (lower = more focused, higher = more creative) | |
| - **top_k**: Limits vocabulary to top K tokens | |
| - **top_p**: Nucleus sampling threshold | |
| - **repeat_penalty**: Reduces repetitive text | |
| - **num_ctx**: Context window size (how much text the model remembers) | |
| - **num_predict**: Maximum tokens to generate per response | |
| ## π οΈ Useful Commands | |
| ### List Available Models | |
| ```bash | |
| ollama list | |
| ``` | |
| ### View Model Information | |
| ```bash | |
| ollama show Palmyra-mini-thinking-b | |
| ``` | |
| ### View Modelfile of Existing Model | |
| ```bash | |
| ollama show --modelfile Palmyra-mini-thinking-b | |
| ``` | |
| ### Remove Model | |
| ```bash | |
| ollama rm Palmyra-mini-thinking-b | |
| ``` | |
| ### Pull Model from Hugging Face (Alternative Method) | |
| If the model were available on Hugging Face, you could also use: | |
| ```bash | |
| ollama run hf.co/username/repository-name | |
| ``` | |
| ## π Choosing the Right Quantization | |
| | Version | File Size | Quality | Speed | RAM Usage | Best For | | |
| |---------|-----------|---------|-------|-----------|----------| | |
| | BF16 | Largest | Highest | Slower | ~16GB+ | Production, highest accuracy | | |
| | Q8_0 | Medium | High | Faster | ~8GB+ | Balanced performance | | |
| ## π Troubleshooting | |
| ### Common Issues | |
| **1. "File not found" error:** | |
| - Verify the file path in your Modelfile | |
| - Use absolute paths if relative paths don't work | |
| - Ensure the GGUF file exists in the specified location | |
| **2. "Out of memory" error:** | |
| - Try the Q8_0 quantization instead of BF16 | |
| - Reduce `num_ctx` parameter | |
| - Close other applications to free up RAM | |
| **3. Model runs but gives poor responses:** | |
| - Adjust temperature and sampling parameters | |
| - Modify the system message | |
| - Try a higher quality quantization | |
| **4. Slow performance:** | |
| - Use Q8_0 quantization for faster inference | |
| - Reduce `num_ctx` if you don't need long context | |
| - Ensure you have sufficient RAM/VRAM | |
| ### Getting Help | |
| - Check Ollama documentation: [https://github.com/ollama/ollama](https://github.com/ollama/ollama) | |
| - Ollama Discord community | |
| - Hugging Face GGUF documentation: [https://huggingface.co/docs/hub/en/gguf](https://huggingface.co/docs/hub/en/gguf) | |
| ## π Additional Resources | |
| - [Ollama Official Documentation](https://github.com/ollama/ollama/blob/main/docs/README.md) | |
| - [Hugging Face Ollama Integration Guide](https://huggingface.co/docs/hub/en/ollama) | |
| - [GGUF Format Documentation](https://huggingface.co/docs/hub/en/gguf) | |
| - [Modelfile Syntax Reference](https://github.com/ollama/ollama/blob/main/docs/modelfile.md) | |
| ## π― Example Usage | |
| Once your model is running, you can interact with it: | |
| ``` | |
| >>> Hello! Can you tell me about yourself? | |
| Hello! I'm Palmyra, an AI assistant created by Writer. I'm designed to be helpful, | |
| harmless, and honest in my interactions. I can assist you with a wide variety of | |
| tasks including writing, analysis, answering questions, coding help, and general | |
| conversation. I aim to provide accurate and detailed responses while being concise | |
| and clear. How can I help you today? | |
| >>> What's the significance of rabbits to Fibonacci? | |
| Rabbits played a significant role in the development of the Fibonacci sequence... | |
| ``` | |
| ## π License | |
| Please refer to the original model license and terms of use from Writer/Palmyra-mini-thinking-b. | |
| --- | |
| **Note**: This guide is based on Ollama's official documentation and community best practices. For the most up-to-date information, always refer to the [official Ollama documentation](https://github.com/ollama/ollama). |