Beyond Auto Complete: Coding with OpenCode, Gemini, and the GitHub CoPilot#

Blog Post Software Engineering Artificial Intelligence OpenCode

From Autocomplete to Agentic Independence#

Coding assistance tools have long served as reliable “copilots.” For years, features like syntax highlighting, basic autocomplete, and snippet insertion enabled engineers to write more correct code with fewer iterations. However, we have recently moved past the era of mere assistance and into the era of conding agents. Unlike the tools of two years ago, modern agents don’t just suggest the next line of text; they manage and understand the context of an entire repository.

Previously, these assistance methods were powered by a mix of heuristic-based systems, traditional machine learning, and early deep learning models. Today, the software engineering community has access to advanced, Large Language Model (LLM) powered agents. These tools leverage the next-token prediction and attention mechanisms that make models like OpenAI’s ChatGPT and Google’s Gemini Pro so effective. Furthermore, because these models are specifically fine-tuned on vast libraries of open-source projects, they have become phenomenal at complex reasoning across a codebase.

What truly sets a “coding agent” apart from a “copilot” is its ability to interact with the environment. Beyond simple text prediction, these agents — when integrated into modern IDEs or TUIs — have access to system-level tooling and project-wide understanding. They can:

  • Read and create complex directory structures.

  • Execute terminal commands like make or npm test.

  • Access outside data through the Model Context Protocol (MCP) or the command line to parse documentation hosted online.

Perhaps most impressively, these agents utilize a closed-loop feedback mechanism. The agent can execute a command, capture the output (such as a compiler error or a failed test), and then review that output to adjust its next move.

In my own workflow, I have watched these agents iteratively improve a project by generating a test case, writing the implementation, watching the test fail, and then refactoring the code until it passes. Witnessing this level of autonomous troubleshooting happen locally on my own laptop is awesome.

Code as an Enabler For Science#

This shift is critical because, as a software engineering researcher, my primary objective is not to produce “production-grade” software, but to publish high-quality scientific research. While it is vital to write code that functionally enables a study, I am not traditionally incentivized — aside from growing reproducibility standards — to obsess over performant, optimized, or perfectly documented code. While the scientific community is starting to reckon with the technical debt inherent in research software, there currently are few ramifications for me to avoid spending time on unit tests or docstrings.

Of course, this isn’t to say I ignore quality. I personally strive to uphold high engineering standards in my work and for the collaborators I lead. However, I would be remiss if I didn’t admit that there are times when I’ve written “less than stellar” code just to meet a pressing conference deadline.

This is precisely why coding agents are a revelation for productivity. By offloading the “grunt work” of coding to an agent, I can redirect my energy toward reviewing background literature, refining scientific methodologies, and extracting deeper insights from data. Previously, scaling this work required training a human colleague to think and act like a software engineer — a significant time investment. Now, I can pair that same colleague with an agent, allowing the entire team to focus on the more pressing challenges of our research.

However, this speed requires a healthy dose of skepticism. While the generation capabilities of these agents are staggering, they are far from infallible. If left unchecked, they can wreck havoc across an entire project directory by introducing subtle bugs, logic errors, or gutting the functionality of components. Even if an agent manages to pass a series of test cases, that is not a guarantee of correctness. I strongly recommend treating every agentic output as a “draft” that requires human review. Understanding exactly what you are asking the agent to do — and critically evaluating the result — is the only way to ensure the AI remains an enabler for science rather than a liability.

However, performing this critical review shouldn’t require a context-switch that breaks your momentum. As a researcher, I am rarely working in a single local window; I am typically juggling multiple SSH sessions across different remote clusters and headnodes. In this environment, opening a heavy, GUI-based IDE just to interact with an agent is a performance killer. It introduces unnecessary friction when you need to stay close to the raw output of the machine.

To maintain high standards without leaving the environment where my code actually lives, I need an agent that exists exactly where I do: the command line. This need for a “terminal-first” philosophy — one that is performant, portable, and stays out of the way — is what led me to my current setup.

OpenCode as an Enabler of Agent Diversity#

My preferred environment for this agentic workflow is OpenCode 1https://opencode.ai, a Terminal User Interface (TUI). While dedicated AI IDEs like Cursor or Windsurf are impressive, they often feel like walled gardens. I chose OpenCode for its lean architecture and its alignment with the “Unix philosophy” of doing one thing well.

The advantages of this TUI-first approach, among others, are:

  1. Limited Bloat: It is exceptionally “light.” There is no UI cruft to navigate, making it easy to install, configure, and run over low-bandwidth SSH sessions where a GUI would lag or fail.

  2. A Defense Against Vendor Lock-in: OpenCode is built to be model-agnostic. It treats LLMs as interchangeable components rather than fixed features. In a single terminal session, I can use OpenAI’s GPT Codex for high-level project planning and then immediately pivot to Anthropic’’s Claude 4.6 Opus for the actual implementation.

Beyond simplicity, the primary technical benefit is provider diversity. OpenCode allows you to switch between model providers, Model Context Protocol (MCP) servers, and specific agents on a per-prompt basis. This modularity enables advanced strategies like prompt-price optimization: you can use a Small Language Model (SLM) running locally on your machine to index and understand the project structure, then “call in” a frontier-lab LLM only when it’s time to generate complex logic.

This flexibility is also a safeguard for your workflow. If you hit a rate limit or exhaust your monthly API credits with one provider, you don’t have to stop working. You can simply pass the existing session context to a different provider and keep your momentum assuming you have the API credits for them as well.

Looking ahead, I envision a future where model providers offer specialized, domain-specific agents. For example, one optimized for parallel image processing at a low cost, another for machine learning pipeline architecture. In an ecosystem like OpenCode, these specialized brains become just another plugin. By forgoing some of the specialized features of “all-in-one” IDEs, we gain the ability to build a custom, resilient, and highly performant agent stack that lives exactly where our code does.

But building this future-proof stack requires more than just the right tools—it requires an intuitive understanding of how these models behave under pressure. You cannot build a “resilient agent stack” if you don’t know where the underlying models tend to hallucinate or where they excel. Fortunately, for those of us in academia or at forward-thinking enterprises, the “tuition” for this experimentation has already been paid.

This brings us to the most practical advice I can give: the best way to prepare for a future of specialized agents is to aggressively stress-test the ones you have access to right now.

Use Your Tokens Before You Lose Your Tokens#

If you have the privilege of a GitHub Copilot Education license or a workplace-wide Google AI Plus subscription, I have one primary piece of advice: burn through your credits. These institutional offerings provide a unique “sandbox” where you can fail for free. My recommendation for mastering these agents is to start small but think critically.

Begin with a project you already know inside and out. Take a well-documented method and ask the agent to document it from scratch. Does it capture the nuance? Does it understand the “why” behind the logic? Now, expand the scope: provide the agent with the entire class or module and repeat the task. Observe how the quality of the output shifts as you provide more project context. This exercise isn’t just about documentation; it’s about learning the “contextual threshold” of the model you are using.

Once you understand the agent’s baseline, move into active validation:

  • Tooling Audit: Ask the agent to identify all the explicit and implicit configuration options within your codebase. See if it can find the “ghosts” in your architecture.

  • Security & Memory Loops: Ask the agent to generate a memory-safe implementation of a function, then validate that code against a tool like valgrind. If it fails, pass the valgrind error log back into the agent. Watching an agent respond to a debugger’s output is the best way to understand its ability to “reason” through technical constraints.

  • Planning vs. Execution: Use the Plan Mode to have the agent tackle a specific GitHub Issue. Evaluate it not just on the code it writes, but on the logic of the steps it proposes.

We are at a unique juncture where LLMs trained on code are only going to become more pervasive and more capable. Use the opportunity your institution has provided to become a leader in understanding what these agents can—and cannot—do. Identify the patterns that lead to failure and the strategies that lead to success.

It is a tall order to stay ahead of this curve, but as students, scientists, and engineers, we are built for this challenge. Burn the tokens, make the mistakes, and break the models now. These agents are here to stay, and the best time to learn their limitations is while someone else is picking up the tab.

Connecting To GitHub CoPilot#

Warning

When leveraging GitHub CoPilot models, please keep in mind the request multiplier that GitHub enforces for each model call. You can read more about it at this documentation page.

  1. Assuming that you already have setup your GitHub Education benefits and installed OpenCode, follow the instructions here.

  2. Execute /models in the prompt bar to search and select the model that you want to use

Connecting To Google Gemini via OAuth#

Tip

Streamline Your Auth: To keep your configuration manageable, I recommend using either OAuth or an API key, but not both. If you need to reset your credentials, use opencode auth logout to clear existing provider tokens.

  1. Initialize Configuration: Create or open your opencode.json config file (see here for where this file is located).

  2. Enable the Plugin: Add the Gemini authentication plugin to your config:

{
   "$schema": "https://opencode.ai/config.json",
   "plugin": ["opencode-gemini-auth@latest"]
}
  1. Login: Run opencode auth login in your terminal.

  2. Authorize: Select Google as your provider and complete the OAuth flow in your browser.

  3. Set the Model: Use the /models in opencode command to select your Gemini variant.

Note

Stability Tip: In my testing for research-heavy workloads, Gemini 2.5 Flash via OAuth has proven to be the most stable and responsive model within the OpenCode TUI environment.