Running Deepseek-R1 Locally with Ollama and Open WebUI#
Blog Post Artificial Intelligence
DeepSeek R1 Has Dropped#
DeepSeek-R1 [1] is the latest open-source LLM model and the first 1To my knowledge open-source reasoning model. While Iâm unfamiliar with the intricacies of reasoning models, the gist of it is that these LLMs âthink throughâ the problem before responding. In other words, as part of the output that you get from your prompt, you also get the chain of thought [2] that supports the reasoning behind the modelâs output. This provides context as to why the model generated its final output.
To be clear, I wouldnât call these models self-explaining; at the end of the day, LLMs are still considered black boxes that generate text based on statistical and mathematical computations. Just because DeepSeek-R1 âthinks throughâ a problem does not mean that it is sentient, accurate, or correct. There is still a need for human-in-the-loop style usage when leveraging these models to evaluate the correctness of the response.
With the context and clarification out of the way, how can you leverage DeepSeek-R1 locally? And more broadly, how do you do so with many open-source LLMs?
Ollama As An Inference Server#
You leverage Ollama 2ollama/ollama, an open-source inference engine that is designed to work with quantized LLMs [3] via the GGUF file format 3ggml-org/ggml on models hosted on the Ollama Model Hub 4https://ollama.com/search.
An inference engine is a utility to run machine and deep learning models efficiently by optimizing the modelâs underlying computational graph.
The computational graph is similar to a programâs call graph (or the order in which instructions are executed) but for mathematics
Quantized LLMs are large language models whose computational graph relies on either a reduced number of bits to represent floating point numbers or integers.
Deep learning models are often trained using bit widths of 64. 128, or higher to represent the nuances of data that can be represented at a given time. Reducing the bit width or precision (e.g., floating point representation to integer representation) often improves the latency of the model (measured in tokens per second) at the cost of precise answers.
Ollama provides a very simple interface to get started with using LLMs locally. Alternatives do exist 5See vllm but the tooling surrounding Ollama is extensive and well-documented, so it is my preferred choice when running LLMs locally.
As Ollama is a command-line utility it can be difficult to leverage tooling such as document and image reasoning, web searching, retrieval augmented generation (RAG) [4], and multi-modal data analysis without having to develop your own interface. This is where GUI interfaces such as Open WebUI 6open-webui/open-webui fill the gap.
Open WebUI#
Open WebUI is a self-hostable application that communicates to Ollama via Ollamaâs HTTP REST API. It provides a ChatGPT-like interface that I find familiar while exposing existing ChatGPT features such as image generation, document reasoning, RAG, and web search. It also supports new features like the ability to chain multiple models together to provide one model with a prompt, and then automatically pass the response of that model into a second or third LLM for post-processing! I think itâs a neat project and an exemplar of the Ollama ecosystem.
Putting It All Together#
Having gone through all of this now, how can we install these tools?
If you are on an M series Mac, you should install Ollama locally and ignore all references to Ollama docker installation hereafter. This is because Ollama via Docker does not support M series Mac GPU acceleration, but the compiled binary does 7You can read about it here..
For everyone else, I recommend installing Ollama and Open Web UI via Docker Compose via this YAML file:
docker-compose.yml file contents
version: '3.8'
name: ai
services:
ollama:
container_name: ollama
image: ollama/ollama:0.5.7
restart: always
networks:
- ollama-network
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
open-webui:
container_name: open-webui
image: ghcr.io/open-webui/open-webui:0.5.7
restart: always
extra_hosts:
- "host.docker.internal:host-gateway"
networks:
- ollama-network
ports:
- "3000:8080"
volumes:
- open-webui:/app/backend/data
networks:
ollama-network:
external: false
volumes:
ollama:
open-webui:
Copy this to a docker-compose.yml file and then run:
Using docker compose to setup services
docker compose --file ./docker-compose.yml create
docker compose --file ./docker-compose.yml start
This installs Ollama at its latest version (as of writing) with NVIDIA
GPU acceleration support 8If you donât have NVIDIA GPU support for Docker or are using a different
GPU vendor or intend to run this on CPU, see this post from
Ollama.. It also installs the latest
version of Open WebUI (as of writing). The Ollama HTTP REST API is
exposed on port 11434 and Open WebUI is exposed on port 3000.
Once installed run the following command to install DeepSeek-R1 from Ollamaâs Model Hub:
Pulling DeepSeek-R1 from the Ollama model hub
docker compose --file ./docker-compose.yml exec ollama ollama pull deepseek-r1:7b
Then refresh your browserâs connection to Open WebUI (via http://localhost:3000) and you should be able to start using DeepSeek R1 locally!
Bibliography#
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Honghui Ding, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jingchang Chen, Jingyang Yuan, Jinhao Tu, Junjie Qiu, Junlong Li, J. L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaichao You, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingxu Zhou, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shaoqing Wu, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, W. L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang, and Zhen Zhang. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature, 645(8081):633â638, September 2025. Publisher: Nature Publishing Group. URL: https://www.nature.com/articles/s41586-025-09422-z (visited on 2025-09-30), doi:10.1038/s41586-025-09422-z.
Jason Wei, View Profile, Xuezhi Wang, View Profile, Dale Schuurmans, View Profile, Maarten Bosma, View Profile, Brian Ichter, View Profile, Fei Xia, View Profile, Ed H. Chi, View Profile, Quoc V. Le, View Profile, Denny Zhou, and View Profile. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, Guide Proceedings, pages 24824â24837. November 2022. URL: https://dlnext.acm.org/doi/10.5555/3600270.3602070 (visited on 2025-11-14), doi:10.5555/3600270.3602070.
Song Han, Huizi Mao, and William J. Dally. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. February 2016. arXiv:1510.00149 [cs]. URL: http://arxiv.org/abs/1510.00149 (visited on 2025-11-14), doi:10.48550/arXiv.1510.00149.
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich KĂŒttler, Mike Lewis, Wen-tau Yih, Tim RocktĂ€schel, Sebastian Riedel, and Douwe Kiela. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems, volume 33, 9459â9474. Curran Associates, Inc., 2020. URL: https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html (visited on 2025-11-14).