This is a Telegram bot that tests your ability to understand spoken language.
Intended users are constructed languages enthusiasts who rarely have an opportunity to hear their conlangs spoken.
The bot will send you voice messages and await your transcription. If the transcription is correct, the bot will give you experience points.
Disclaimer: My second bot, there's probably massive antipatterns everywhere.
Bot | Language | Bot's site |
---|---|---|
@lfnescutabot | Elefen |
Let's say you want to deploy an instance for Newspeak.
Do the following steps.
Talk to BotFather and make a new bot. Bot's name can be anything and is not important, we really need the token.
Download or clone this repository to your server.
Create file language-newspeak.xml
inside botvars/
folder.
There are examples of XML files for Elefen, Esperanto and Interslavic.
<alphabet>
section should list letters that are worth experience points (case of letters is not important).
For Newspeak it will just contain latin.
<transform>
section is needed if your language has several alphabets or alternative spellings for letters, and it explains how to map those alternative letters to letters in <alphabet>
.
For Newspeak this section will be empty.
Next step is to translate bot's messages from (my bad) English to Newspeak.
First, generate .po
file:
pygettext3 -o newspeak.po code/
Open it and write translations.
See examples in elefen.po
.
After that, convert it to .mo
format which the bot can find and read:
mkdir -p locales/newspeak/LC_MESSAGES/
msgfmt -o locales/newspeak/LC_MESSAGES/cmpdbot.mo newspeak.po
What matters here is the resulting cmpdbot.mo
file in a correct directory.
Now inside botvars/
folder create a .env
file.
Actual name is not super important - it can be newspeak.env
, bot.env
, etc.
The examples below are assuming you called it just .env
.
cp -n botvars/example-.env botvars/.env
chmod 0600 botvars/.env
Edit the variables inside it:
Variable | Comment |
---|---|
CMPDBOT_DIR |
Type: Path to folder Example: /cmpdbot Where all bot related stuff should be copied inside Docker container. |
CMPDBOT_LANGUAGE |
Type: Locale name Example: newspeak Language locale folder as in locales/newspeak/LC_MESSAGES/ . |
CMPDBOT_LANGUAGE_SITE |
Type: URL Example: https://en.wikipedia.org/wiki/Newspeak |
CMPDBOT_LANGUAGE_FILE |
Type: File name Example: language-newspeak.xml Name of language XML file inside botvars/ . |
CMPDBOT_SIMILARITY_RATIO |
Type: Float Example: 0.8 When one compares two phrases for similarity with Levenshtein ratio() the result is a float , and if that float is greater than the value of CMPDBOT_SIMILARITY_RATIO , the two phrases are considered "similar". Right now, the bot doesn't do much about it, but in the future versions it might use this value for better phrases management. |
CMPDBOT_TOKEN |
Type: Telegram API token Example: 123456:ABC-DEF1234ghIkl-zyx57W2v1u123ew11 The secret that you've acquired from BotFather. |
CMPDBOT_LOCALE_DIR |
Type: Path to folder Example: /cmpdbot/locales Where bot will find its localization inside container. |
CMPDBOT_LINK |
Type: URL Example: https://newspeak.bot/ Official site of your bot if there is one. |
CMPDBOT_L10N_DOMAIN |
Type: Lacale domain Example: cmpdbot File name of locales/newspeak/LC_MESSAGES/cmpdbot.mo without .mo suffix. |
CMPDBOT_MASK |
Type: String Example: ? What symbol to use when masking phrases. |
CMPDBOT_EXCHANGE_DIR_LOCAL |
Type: Path to folder Example: /home/wsmith/newspeak-bot-volume Folder that will serve as a Docker volume on client machine. |
CMPDBOT_EXCHANGE_DIR_CONTAINER |
Type: Path to folder Example: /cmpdbot/exchange Folder that will serve as a Docker volume inside container. |
CMPDBOT_CONST_START |
Type: String Example: start Callback data of a start button. (Technical value. Leave as is.) |
CMPDBOT_MIN_SILVER |
Type: Float Example: 0.7 If the transcription's similarity to the original phrase is higher than this value, send user "silver medal" sticker. |
CMPDBOT_MIN_BRONZE |
Type: Float Example: 0.3 If the transcription's similarity to the original phrase is higher than this value, send user "bronze medal" sticker. |
CHOOSE_SEVRAL_TIMEZ |
Type: Integer Example: 5 Max amount of consecuitive challange lookups. (Technical value. Leave as is.) |
CHOOSE_CHANCE_PHRASE |
Type: Integer Example: 1 How often should user be asked to add a new phrase to database. The greater the more often. |
CHOOSE_CHANCE_VOICE |
Type: Integer Example: 3 How often should user be asked to submit a voice message. The greater the more often. |
CHOOSE_CHANCE_TRANSCRIPTION |
Type: Integer Example: 9 How often should user be asked to transcribe a voice message. The greater the more often. |
CHOOSE_MIN_XP_PHRASE |
Type: Integer Example: 1000 How many experience points should user have before they're allowed to add a new phrase to database. |
CHOOSE_MIN_XP_VOICE |
Type: Integer Example: 100 How many experience points should user have before they're allowed to submit a voice message. |
CHOOSE_SAMPLE_PHRASE |
Type: Integer Example: 10 How many random phrases should be selected from database before inputting them to the chooser module. Making it less will increase "randomness". |
CHOOSE_SAMPLE_VOICE |
Type: Integer Example: 10 How many random voice recordings should be selected from database before inputting them to the chooser module. Making it less will increase "randomness". |
CHOOSE_SUCCESS_BOOST |
Type: Integer Example: 7 The amount of letters the user transcribes correctly is saved as their "last success". Next time the bot will try to choose a phrase with length closer to last success + boost. |
CHOOSE_HOLD_SECONDS |
Type: Integer Example: 172800 Amount of seconds that should pass before user's phrase or voice recording can be used as a challenge to other users. |
LOG_LEVEL |
Type: String Example: DEBUG |
POSTGRES_USER |
Type: String Example: newspeakbot |
POSTGRES_PASSWORD |
Type: String Example: newspeakbot2+2=5 |
POSTGRES_HOST |
Type: String Example: pg Postgres host name. If you plan to use supplied Compose file and hence its Postgres container, keep value pg . |
POSTGRES_PORT |
Type: Integer Example: 5432 |
POSTGRES_DB |
Type: String Example: newspeakbot |
MIGRATIONS_SYNC |
Type: Boolean Example: 1 , nothingWhether or not the bot should delay its start until database migrations apply. It generally should. (Why did I make it configurable?) |
MIGRATIONS_HOST |
Type: String Example: migrations Host name of the container that runs database migrations. If you plan to use supplied Compose file, keep value migrations . |
MIGRATIONS_PORT |
Type: Integer Example: 10946 |
S3_HOST |
Type: String Example: s3 S3 host name. If you plan to use supplied Compose file and hence its Minio container, keep value s3 . |
S3_PORT |
Type: Integer Example: 9000 |
S3_ACCESS_KEY |
Type: String Example: newspeakbot |
S3_SECRET_KEY |
Type: String Example: newspeakbot2+2=1984 |
S3_VOICES_BUCKET |
Type: String Example: newspeakbotvoices Name of the S3 bucket where the voice binaries should be stored. |
STICKER_PHR |
Type: String Example: CAACAgIAAxkBAAMDYCujE1oR-Zjt5IwdddnxkWQxPCIAAhMAA2XuFBAo2PGrWKbT_B4E Phrase challenge sticker. |
STICKER_VOC |
Type: String Example: CAACAgIAAxkBAAMEYCujRVhYHR18bF0j60fLGjuwLqAAAhIAA2XuFBD_h-TjUQABNUMeBA Voice recording challenge sticker. |
STICKER_TRS |
Type: String Example: CAACAgIAAxkBAAIKUWA4yrD4dH2G4UXKd0wgfgYFb0YeAAJfDAAC8XTJSfpu_cyvBjl3HgQ Transcription challenge sticker. |
STICKER_GOLD |
Type: String Example: CAACAgIAAxkBAAMGYCujn4-bfI6aGs6695L5Yc5fn3wAAhcAA2XuFBB3ge6WMuz0fx4E Gold medal sticker. |
STICKER_SILVER |
Type: String Example: CAACAgIAAxkBAAMHYCujxdhBazEQ4PPC2onSHXBPNnQAAhgAA2XuFBApYLlTBEVLlR4E Silver medal sticker. |
STICKER_BRONZE |
Type: String Example: CAACAgIAAxkBAAMIYCukAocAAXoPOcKVRUR7NC8Pe2u7AAIZAANl7hQQFV0zhLMubpkeBA Bronze medal sticker. |
STICKER_PAPER |
Type: String Example: CAACAgIAAxkBAAMJYCukLP7LfgTJsgWP-5UdwFs4_zIAAhoAA2XuFBD0kfJTn_og1x4E Toilet paper medal sticker. |
STICKER_OK_PHR |
Type: String Example: CAACAgIAAxkBAAIKVGA4-iFDwkiwCLsdFti22lRi-gABngACRQoAAjZlwUnnv8Xv3hNHQB4E Phrase saved sticker. |
STICKER_OK_VOC |
Type: String Example: CAACAgIAAxkBAAIKVWA4-j1LUUkNEolVUOgZ1CwnLsSbAAKHCgACZvXISdcVNZqQTrzoHgQ Voice saved sticker. |
There are probably better, more efficient ways to do this. Anyway...
Create environment variable DOTENV_FILE
holding your .env file name - just .env
here in the examples.
export DOTENV_FILE=.env
Now invoke docker-compose up
:
docker-compose --env-file botvars/$DOTENV_FILE up --scale manage=0 --build --remove-orphans
It will produce a lot of output while downloading and starting everything.
You know that it (probably) worked when you see INFO:aiogram.dispatcher.dispatcher:Start polling.
.
Now stop it with Ctrl+C
and relaunch everything in a background with -d
.
docker-compose --env-file botvars/$DOTENV_FILE up --scale manage=0 --build --remove-orphans -d
Next step is adding some initial Newspeak phrases to the bot's database.
Inside manage
container there's a command for it: botaddphrases
.
It takes a file where each line contains one phrase and adds them all in one go.
Create file phrases.txt
inside the Docker volume folder (CMPDBOT_EXCHANGE_DIR_LOCAL
):
cat > phrases.txt <<1984
War is peace.
Freedom is slavery.
Ignorance is strength.
1984
Now launch the manage
container:
docker-compose --env-file botvars/$DOTENV_FILE run --rm manage bash
And inside the container invoke the command:
botaddphrases $CMPDBOT_EXCHANGE_DIR_CONTAINER/phrases.txt
Sometimes the command will complain about similar phrases that are already in the database.
Type y
if you wish to insert the phrase nonetheless.
Almost there. Now it's time to add voice recordings.
It's easy to do through the bot itself. Start conversation with it in Telegram and it will send you voice recording tasks.
Pay attention though that if you've set CHOOSE_MIN_XP_VOICE
to anything greater than 0
, the bot will send you nothing because your XP is 0
.
To bypass it you can cheat your XP in database:
docker exec -ti $( docker ps -f "ancestor=postgres" -q ) bash
psql -U $POSTGRES_USER
UPDATE person SET xp = 9000 WHERE id = <YOUR ID>;
Or you can temporarily set CHOOSE_MIN_XP_VOICE
to 0
and restart the bot:
docker-compose restart
Let me know about your instance so I could update my table of existing bots. Thank you.
The bot operates with the following entities each stored in their own table in Postgres:
Pg table | Meaning |
---|---|
person |
Telegram users. |
phrase |
Catalog of phrases in textual form. |
voice |
Submitted voice messages. (The binaries are stored in S3.) |
transcription |
Submitted transcriptions of voices. |
challenge |
Every task sent to users is abstracted with challenge entity. |
To create data backups do pg_dump
for Postgres and mc mirror
for S3 voices bucket.
The meaning of is_active
column:
Pg table | is_active means... |
---|---|
person |
is_active = false means that the user is shadow-banned. All their new submissions will be saved with is_active=false |
phrase |
is_active = false means that the phrase will not be sent as a voice recording challenge to users. |
voice |
is_active = false means that the voice will not be sent as a voice transcribing challenge to users. |
transcription |
Not used for transcriptions. |
challenge |
The user's current challenge has is_active = true. All past challenges (complete and skipped) have is_active = false. |
Setting is_active
= false for a user doesn't block already existing active phrases and voices.
If desired, they should be blocked individually or by issuing a command botblockuser <PERSON_ID>
inside manage
container that blocks user's all existing content.
Things that also should be added soon
- Testing
- Easy monitoring
- Admin site
- In-bot reporting
- Show best XP
- Show generated XP
- Async Sqlalchemy
- In-bot submissions management (delete user data)
- Delete old/blocked content
- Sql indicies
- In-bot management