| app | ||
| scripts | ||
| signal-app | ||
| .gitignore | ||
| docker-compose-initdb.yaml | ||
| docker-compose.yaml | ||
| env.in | ||
| initdb.Dockerfile | ||
| Makefile | ||
| README.md | ||
What the devil is this?
It's a Signal bot that emits Infobot-style factoids, with an LLM for fuzzy matching.
When a user asks a question, it first checks the database for a verbatim answer, and emits that if it finds one.
If there's no exact match:
- the LLM parses the question into one or more topics ("tell me about alice and bob" becomes
["alice", "bob"]) - the topics are vector encoded and queried against the encodings in the factoid database
- the nearest match for each topic is sent back to the LLM, which then is asked to phrase it back to the user
Why?
Huh?
What are its limitations?
It doesn't have heaps of feature parity with the old perl infobot. The right way to get that might be to hack on the old bot code and use it as the main chat parser for this. I don't have a ton of desire to sit down and code my own implementation of the entire thing.
Some important stuff we're missing right now:
- Creating new factoids
- Understanding when questions are being asked of the bot so it doesn't just respond to every single thing that's said
- Botsnacks.
And some non-infobot stuff we could use:
- A better prompt for the LLM to integrate multiple factoids into a single response
- Some security precautions against prompt injection etc. -- at the moment it's just "trust only those on the allowlist"
How do I use it?
You need CUDA working in Docker, or to edit the docker compose files to take that stuff out and rely on CPU.
At present the LLM seems to require about 2GB of GPU RAM, which is really small as LLMs go. My PC works harder playing Balatro.
Initialize the database
At present it only works on a static copy of factoids imported from an infobot. So you need to import some.
This took around an hour to do 300k factoids and a fair amount of compute/GPU power. There's no consistency or duplicate checking at the moment so you're best off trashing the postgres data dir first
-
dump the factoid database into "is.txt" and "are.txt" and put them in scripts/ (tab separated lines: "topic\tresponse")
-
make import_factoids
Prepare a Signal account
Making signal-cli work can be fairly involved. Check the wiki at the signal-cli repo for details on how, but run this command to get a shell into the signal-cli container:
- make signalbash
Create an env file
- Copy
env.intoenvand edit its contents to what you need.
Start the server
- make