Upload ollama-README-B.md

c7379e4 verified 6 months ago

7.03 kB

	# Palmyra-mini-thinking-b GGUF Model Import Guide for Ollama

	This guide provides step-by-step instructions for importing the Palmyra-mini-thinking-b GGUF model files into Ollama for local inference.

	## 📁 Available Model Files

	This directory contains two quantized versions of the Palmyra-mini-thinking-b model:

	- `Palmyra-mini-thinking-b-BF16.gguf` - BFloat16 precision (highest quality, largest size)
	- `Palmyra-mini-thinking-b-Q8_0.gguf` - 8-bit quantization (high quality, medium size)

	## 🔧 Prerequisites

	Before getting started, ensure you have:

	- Ollama installed on your system ([Download from ollama.com](https://ollama.com/))
	- Sufficient RAM/VRAM for your chosen model:
	- BF16: ~16GB+ RAM recommended
	- Q8_0: ~8GB+ RAM recommended
	- Terminal/Command Line access

	## 🚀 Quick Start Guide

	### Method 1: Import Local GGUF File (Recommended)

	#### Step 1: Navigate to Model Directory
	```bash
	cd "/Users/[user]/Documents/Model Weights/SPW2 Mini Launch/Palmyra-mini-thinking-b/GGUF/Palmyra-mini-thinking-b FIXED GGUF-BF16"
	```

	#### Step 2: Create a Modelfile
	Create a new file named `Modelfile` (no extension) with the following content:

	For BF16 version (highest quality):
	```
	FROM ./Palmyra-mini-thinking-b-BF16.gguf
	PARAMETER temperature 0.3
	PARAMETER num_ctx 4096
	PARAMETER top_k 40
	PARAMETER top_p 0.95
	SYSTEM "You are Palmyra, an advanced AI assistant created by Writer. You are helpful and honest. You provide accurate and detailed responses while being concise and clear."
	```

	For Q8_0 version (balanced):
	```
	FROM ./Palmyra-mini-thinking-b-Q8_0.gguf
	PARAMETER temperature 0.3
	PARAMETER num_ctx 4096
	PARAMETER top_k 40
	PARAMETER top_p 0.95
	SYSTEM "You are Palmyra, an advanced AI assistant created by Writer. You are helpful and honest. You provide accurate and detailed responses while being concise and clear."
	```



	#### Step 3: Import the Model
	```bash
	ollama create Palmyra-mini-thinking-b -f Modelfile
	```

	#### Step 4: Run the Model
	```bash
	ollama run Palmyra-mini-thinking-b
	```

	### Method 2: Using Absolute Paths

	If you prefer to create the Modelfile elsewhere, use absolute paths:

	```
	FROM "/Users/thomas/Documents/Model Weights/SPW2 Mini Launch/Palmyra-mini-thinking-b/GGUF/Palmyra-mini-thinking-b FIXED GGUF-BF16/Palmyra-mini-thinking-b-BF16.gguf"
	PARAMETER temperature 0.3
	PARAMETER num_ctx 4096
	SYSTEM "You are Palmyra, an advanced AI assistant created by Writer."
	```

	Then create and run:
	```bash
	ollama create Palmyra-mini-thinking-b -f /path/to/your/Modelfile
	ollama run Palmyra-mini-thinking-b
	```

	## ⚙️ Advanced Configuration

	### Custom Modelfile Parameters

	You can customize the model behavior by modifying these parameters in your Modelfile:

	```
	FROM ./Palmyra-mini-thinking-b-BF16.gguf

	# Sampling parameters
	PARAMETER temperature 0.3 # Creativity (0.1-2.0)
	PARAMETER top_k 40 # Top-k sampling (1-100)
	PARAMETER top_p 0.95 # Top-p sampling (0.1-1.0)
	PARAMETER repeat_penalty 1.1 # Repetition penalty (0.8-1.5)
	PARAMETER num_ctx 4096 # Context window size
	PARAMETER num_predict 512 # Max tokens to generate

	# Stop sequences
	PARAMETER stop "<\|end\|>"
	PARAMETER stop "<\|endoftext\|>"

	# System message
	SYSTEM """You are Palmyra, an advanced AI assistant created by Writer.
	You are helpful, harmless, and honest. You provide accurate and detailed
	responses while being concise and clear. You can assist with a wide range
	of tasks including writing, analysis, coding, and general questions."""
	```

	### Parameter Explanations

	- temperature: Controls randomness (lower = more focused, higher = more creative)
	- top_k: Limits vocabulary to top K tokens
	- top_p: Nucleus sampling threshold
	- repeat_penalty: Reduces repetitive text
	- num_ctx: Context window size (how much text the model remembers)
	- num_predict: Maximum tokens to generate per response

	## 🛠️ Useful Commands

	### List Available Models
	```bash
	ollama list
	```

	### View Model Information
	```bash
	ollama show Palmyra-mini-thinking-b
	```

	### View Modelfile of Existing Model
	```bash
	ollama show --modelfile Palmyra-mini-thinking-b
	```

	### Remove Model
	```bash
	ollama rm Palmyra-mini-thinking-b
	```

	### Pull Model from Hugging Face (Alternative Method)
	If the model were available on Hugging Face, you could also use:
	```bash
	ollama run hf.co/username/repository-name
	```

	## 🔍 Choosing the Right Quantization

	\| Version \| File Size \| Quality \| Speed \| RAM Usage \| Best For \|
	\|---------\|-----------\|---------\|-------\|-----------\|----------\|
	\| BF16 \| Largest \| Highest \| Slower \| ~16GB+ \| Production, highest accuracy \|
	\| Q8_0 \| Medium \| High \| Faster \| ~8GB+ \| Balanced performance \|

	## 🐛 Troubleshooting

	### Common Issues

	1. "File not found" error:
	- Verify the file path in your Modelfile
	- Use absolute paths if relative paths don't work
	- Ensure the GGUF file exists in the specified location

	2. "Out of memory" error:
	- Try the Q8_0 quantization instead of BF16
	- Reduce `num_ctx` parameter
	- Close other applications to free up RAM

	3. Model runs but gives poor responses:
	- Adjust temperature and sampling parameters
	- Modify the system message
	- Try a higher quality quantization

	4. Slow performance:
	- Use Q8_0 quantization for faster inference
	- Reduce `num_ctx` if you don't need long context
	- Ensure you have sufficient RAM/VRAM

	### Getting Help

	- Check Ollama documentation: [https://github.com/ollama/ollama](https://github.com/ollama/ollama)
	- Ollama Discord community
	- Hugging Face GGUF documentation: [https://huggingface.co/docs/hub/en/gguf](https://huggingface.co/docs/hub/en/gguf)

	## 📚 Additional Resources

	- [Ollama Official Documentation](https://github.com/ollama/ollama/blob/main/docs/README.md)
	- [Hugging Face Ollama Integration Guide](https://huggingface.co/docs/hub/en/ollama)
	- [GGUF Format Documentation](https://huggingface.co/docs/hub/en/gguf)
	- [Modelfile Syntax Reference](https://github.com/ollama/ollama/blob/main/docs/modelfile.md)

	## 🎯 Example Usage

	Once your model is running, you can interact with it:

	```
	>>> Hello! Can you tell me about yourself?

	Hello! I'm Palmyra, an AI assistant created by Writer. I'm designed to be helpful,
	harmless, and honest in my interactions. I can assist you with a wide variety of
	tasks including writing, analysis, answering questions, coding help, and general
	conversation. I aim to provide accurate and detailed responses while being concise
	and clear. How can I help you today?

	>>> What's the significance of rabbits to Fibonacci?

	Rabbits played a significant role in the development of the Fibonacci sequence...
	```

	## 📄 License

	Please refer to the original model license and terms of use from Writer/Palmyra-mini-thinking-b.

	---

	Note: This guide is based on Ollama's official documentation and community best practices. For the most up-to-date information, always refer to the [official Ollama documentation](https://github.com/ollama/ollama).