Continuing the recent spate of AI-related posts ...
Introduction
I often find myself wanting to ask a question about a GitHub repository or Arxiv file to an LLM (large language model). There are several ways to go about this, but one I like is onefilellm.1 It's a simple open source tool which takes a link (or a directory) and returns the contents as plaintext, making it easy for LLMs to parse.
It's a solid tool, but it isn't ergonomic out of the box. In this post, I'll detail how I've configured it in Windows to be easier to use. All the sample code that follows will be for Windows, but setting this up on Mac or Linux should be quite similar. Let me know in the comments if you have done so!
(A previous draft of this post included sample code for Mac and Linux, but I cut that because they were Claude generated and contained errors.2)
1. Clone the Git Repository
Choose a directory to store Onefilellm. For example's sake, I will chose C:\code\resources
.
cd C:\code\resources #Choose your directory! git clone https://github.com/jimmc414/onefilellm cd onefilellm
Then, initialize a virtual environment:
python -m venv .venv .venv\Scripts\activate
and install dependencies:
pip install -r requirements.txt
2. Set your GitHub Token
On github.com, navigate to the tokens page and create a new (classic) token. If you want the ability to scrape your own private repos, then choose the repo
permission. Otherwise, public_repo
should work fine as the permission.3
Set GITHUB_TOKEN
as an environment variable. You can create a new System Environment Variables (see below) or type $env:GITHUB_TOKEN=<your-token>
to set it temporarily in PowerShell.
3. Make a script ...
Now, you should be able to run Onefilellm! All you have to do is type
cd C:\code\resources\onefilellm #whatever your directory is .venv\Scripts\activate python onefilellm deactivate
But, that's quite a bit to type each time. Since it's a repetitive task, we can have a terminal script automate these steps!
Writing the script
Basically, in order to run the script, we want to (1) navigate to the onefilellm directory, (2) activate the virtual environment, (3) run the python file, (4) deactivate the virtual environment, and (5) return to the original directoy.
To do so, we can use the following script (in Windows):
@echo off set ORIGINAL_DIR=%CD% cd /d "C:\code\resources\onefilellm" ::Add your directory here call .venv\Scripts\activate python onefilellm.py %* call deactivate cd /d "%ORIGINAL_DIR"
In other words, we
@echo off
: that means the following instructions won't be written to the terminalset ORIGINAL_DIR=%CD%
: we're going to remember the current directory so that we can return to it later.cd /d "C:\code\resources\onefilellm"
: We are moving to the onefilellm directory. (As far as I can tell/d
means that you can switch drives on your PC, making it more robust)call .onefilellm\Scripts\activate
: activating the virtual environment.python onefilellm.py %*
: Call the python function, with%*
meaning we pass along the arguments that were passed to this PowerShell script.call deactivate
: deactivate the virtual environmentcd /d "%ORIGINAL_DIR"
: return to our original directory, switching drives as necessary.
Saving the script
Save the script somewhere as onefilellm.bat
. In my case, I saved it in C:\scripts
.
Putting the script onto PATH
We would like to execute this as a script! On Windows, this can be done by putting our path to the file onto PATH. To do so, either follow these instructions on stack exchange or do the following:
- Search for "Edit the system environment variables" in the windows search bar
- This opens up a "System properties" window. Click "Environment Variables" in the bottom right corner
- Click on "Path" under user variables and then press "edit"
- Press New, and add the path of the .bat file
- Restart PowerShell
Calling the script
To use onefilellm, now just type onefilellm
in the terminal from any directory! And if you know what you are going to copy (say https://github.com/jimmc414/onefilellm), then you can just write
onefilellm https://github.com/jimmc414/onefilellm
and the resulting scrape will be added on your clipboard.
Appendix: Excluding directories
I have found it useful to modify the python script to exclude any directories which start with "."; this way, I can exclude venv
files more easily. (They usually aren't relevant to the repository, but take up a great deal of space).
To do so yourself, feel free to copy the working code I've written on a forked branch of the repository.
Notes
-
I wrote this post before Claude 3.7, which has a github integration, so it is now somewhat less relevant to my daily life. But, I still occasionally use chatbots without built-in parsing (and I have some code in files which aren't on GitHub, so I figure I'll still share the note, if only for reference. ↩
-
Thanks to Mikkel Paulson for catching the error, and Julia Evans for the advice about only including code you can test! ↩
-
I am not entirely sure about this. I chose the
repo
permission and it has seemed to work fine. ↩