Using Onefilellm with a terminal script

Published on February 26, 2025

Continuing the recent spate of AI-related posts ...

Introduction

I often find myself wanting to ask a question about a GitHub repository or Arxiv file to an LLM (large language model). There are several ways to go about this, but one I like is onefilellm.¹ It's a simple open source tool which takes a link (or a directory) and returns the contents as plaintext, making it easy for LLMs to parse.

It's a solid tool, but it isn't ergonomic out of the box. In this post, I'll detail how I've configured it in Windows to be easier to use. All the sample code that follows will be for Windows, but setting this up on Mac or Linux should be quite similar. Let me know in the comments if you have done so!

(A previous draft of this post included sample code for Mac and Linux, but I cut that because they were Claude generated and contained errors.²)

1. Clone the Git Repository

Choose a directory to store Onefilellm. For example's sake, I will chose C:\code\resources.

cd C:\code\resources #Choose your directory!
git clone https://github.com/jimmc414/onefilellm
cd onefilellm

Then, initialize a virtual environment:

python -m venv .venv
.venv\Scripts\activate

and install dependencies:

pip install -r requirements.txt

2. Set your GitHub Token

On github.com, navigate to the tokens page and create a new (classic) token. If you want the ability to scrape your own private repos, then choose the repo permission. Otherwise, public_repo should work fine as the permission.³

Set GITHUB_TOKEN as an environment variable. You can create a new System Environment Variables (see below) or type $env:GITHUB_TOKEN=<your-token> to set it temporarily in PowerShell.

3. Make a script ...

Now, you should be able to run Onefilellm! All you have to do is type

cd C:\code\resources\onefilellm #whatever your directory is
.venv\Scripts\activate
python onefilellm
deactivate

But, that's quite a bit to type each time. Since it's a repetitive task, we can have a terminal script automate these steps!

Writing the script

Basically, in order to run the script, we want to (1) navigate to the onefilellm directory, (2) activate the virtual environment, (3) run the python file, (4) deactivate the virtual environment, and (5) return to the original directoy.

To do so, we can use the following script (in Windows):

@echo off

set ORIGINAL_DIR=%CD%
cd /d "C:\code\resources\onefilellm" ::Add your directory here
call .venv\Scripts\activate
python onefilellm.py %*
call deactivate
cd /d "%ORIGINAL_DIR"

In other words, we

@echo off: that means the following instructions won't be written to the terminal
set ORIGINAL_DIR=%CD%: we're going to remember the current directory so that we can return to it later.
cd /d "C:\code\resources\onefilellm": We are moving to the onefilellm directory. (As far as I can tell /d means that you can switch drives on your PC, making it more robust)
call .onefilellm\Scripts\activate: activating the virtual environment.
python onefilellm.py %*: Call the python function, with %* meaning we pass along the arguments that were passed to this PowerShell script.
call deactivate: deactivate the virtual environment
cd /d "%ORIGINAL_DIR": return to our original directory, switching drives as necessary.

Saving the script

Save the script somewhere as onefilellm.bat. In my case, I saved it in C:\scripts.

Putting the script onto PATH

We would like to execute this as a script! On Windows, this can be done by putting our path to the file onto PATH. To do so, either follow these instructions on stack exchange or do the following:

Search for "Edit the system environment variables" in the windows search bar
This opens up a "System properties" window. Click "Environment Variables" in the bottom right corner
Click on "Path" under user variables and then press "edit"
Press New, and add the path of the .bat file
Restart PowerShell

Calling the script

To use onefilellm, now just type onefilellm in the terminal from any directory! And if you know what you are going to copy (say https://github.com/jimmc414/onefilellm), then you can just write

onefilellm https://github.com/jimmc414/onefilellm

and the resulting scrape will be added on your clipboard.

Appendix: Excluding directories

I have found it useful to modify the python script to exclude any directories which start with "."; this way, I can exclude venv files more easily. (They usually aren't relevant to the repository, but take up a great deal of space). To do so yourself, feel free to copy the working code I've written on a forked branch of the repository.

Notes

I wrote this post before Claude 3.7, which has a github integration, so it is now somewhat less relevant to my daily life. But, I still occasionally use chatbots without built-in parsing (and I have some code in files which aren't on GitHub, so I figure I'll still share the note, if only for reference. ↩
Thanks to Mikkel Paulson for catching the error, and Julia Evans for the advice about only including code you can test! ↩
I am not entirely sure about this. I chose the repo permission and it has seemed to work fine. ↩