Running DeepSeek-R1 Locally with Ollama and Open WebUI

Running DeepSeek-R1 Locally with Ollama and Open WebUI#

Blog Post Artificial Intelligence

DeepSeek R1 Has Dropped#

DeepSeek-R1 [1] is the latest open-source LLM model and the first ¹ ¹To my knowledge open-source reasoning model. While I’m unfamiliar with the intricacies of reasoning models, the gist of it is that these LLMs “think through” the problem before responding. In other words, as part of the output that you get from your prompt, you also get the chain of thought [2] that supports the reasoning behind the model’s output. This provides context as to why the model generated its final output.

To be clear, I wouldn’t call these models self-explaining; at the end of the day, LLMs are still considered black boxes that generate text based on statistical and mathematical computations. Just because DeepSeek-R1 “thinks through” a problem does not mean that it is sentient, accurate, or correct. There is still a need for human-in-the-loop style usage when leveraging these models to evaluate the correctness of the response.

With the context and clarification out of the way, how can you leverage DeepSeek-R1 locally? And more broadly, how do you do so with many open-source LLMs?

Ollama As An Inference Server#

You leverage Ollama ² ²ollama/ollama, an open-source inference engine that is designed to work with quantized LLMs [3] via the GGUF file format ³ ³ggml-org/ggml on models hosted on the Ollama Model Hub ⁴ ⁴https://ollama.com/search.

An inference engine is a utility to run machine and deep learning models efficiently by optimizing the model’s underlying computational graph.

The computational graph is similar to a program’s call graph (or the order in which instructions are executed) but for mathematics

Quantized LLMs are large language models whose computational graph relies on either a reduced number of bits to represent floating point numbers or integers.

Deep learning models are often trained using bit widths of 64. 128, or higher to represent the nuances of data that can be represented at a given time. Reducing the bit width or precision (e.g., floating point representation to integer representation) often improves the latency of the model (measured in tokens per second) at the cost of precise answers.

Ollama provides a very simple interface to get started with using LLMs locally. Alternatives do exist ⁵ ⁵See vllm but the tooling surrounding Ollama is extensive and well-documented, so it is my preferred choice when running LLMs locally.

As Ollama is a command-line utility it can be difficult to leverage tooling such as document and image reasoning, web searching, retrieval augmented generation (RAG) [4], and multi-modal data analysis without having to develop your own interface. This is where GUI interfaces such as Open WebUI ⁶ ⁶open-webui/open-webui fill the gap.

Open WebUI#

Open WebUI is a self-hostable application that communicates to Ollama via Ollama’s HTTP REST API. It provides a ChatGPT-like interface that I find familiar while exposing existing ChatGPT features such as image generation, document reasoning, RAG, and web search. It also supports new features like the ability to chain multiple models together to provide one model with a prompt, and then automatically pass the response of that model into a second or third LLM for post-processing! I think it’s a neat project and an exemplar of the Ollama ecosystem.

Putting It All Together#

Having gone through all of this now, how can we install these tools?

If you are on an M series Mac, you should install Ollama locally and ignore all references to Ollama docker installation hereafter. This is because Ollama via Docker does not support M series Mac GPU acceleration, but the compiled binary does ⁷ ⁷You can read about it here..

For everyone else, I recommend installing Ollama and Open WebUI via Docker Compose via this YAML file:

Copy this to a docker-compose.yml file and then run:

This installs Ollama at its latest version (as of writing) with NVIDIA GPU acceleration support ⁸ ⁸If you don’t have NVIDIA GPU support for Docker or are using a different GPU vendor or intend to run this on CPU, see this post from Ollama.. It also installs the latest version of Open WebUI (as of writing). The Ollama HTTP REST API is exposed on port 11434 and Open WebUI is exposed on port 3000.

Once installed run the following command to install DeepSeek-R1 from Ollama’s Model Hub:

Then refresh your browser’s connection to Open WebUI (via http://localhost:3000) and you should be able to start using DeepSeek R1 locally!

Bibliography#

[1]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Honghui Ding, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jingchang Chen, Jingyang Yuan, Jinhao Tu, Junjie Qiu, Junlong Li, J. L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaichao You, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingxu Zhou, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shaoqing Wu, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, W. L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang, and Zhen Zhang. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature, 645(8081):633–638, September 2025. Publisher: Nature Publishing Group. URL: https://www.nature.com/articles/s41586-025-09422-z (visited on 2025-09-30), doi:10.1038/s41586-025-09422-z.

[2]

Jason Wei, View Profile, Xuezhi Wang, View Profile, Dale Schuurmans, View Profile, Maarten Bosma, View Profile, Brian Ichter, View Profile, Fei Xia, View Profile, Ed H. Chi, View Profile, Quoc V. Le, View Profile, Denny Zhou, and View Profile. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, Guide Proceedings, pages 24824–24837. November 2022. URL: https://dlnext.acm.org/doi/10.5555/3600270.3602070 (visited on 2025-11-14), doi:10.5555/3600270.3602070.

[3]

Song Han, Huizi Mao, and William J. Dally. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. February 2016. arXiv:1510.00149 [cs]. URL: http://arxiv.org/abs/1510.00149 (visited on 2025-11-14), doi:10.48550/arXiv.1510.00149.

[4]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems, volume 33, 9459–9474. Curran Associates, Inc., 2020. URL: https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html (visited on 2025-11-14).