I created and currently maintain a Discord bot that is supposed to detect messages that indicate signs of suicide, and direct people to resources that can help. Originally, this was done with a list of key words and phrases that the bot checked each message against. From the beginning, the team of volunteer contributors who have all pitched in to help over the years have mostly come to the same conclusion: That solution sucks.
For example, take the key word "kys". This should be picked up as an insult that could cause harm to someone else. The problem begins when someone uses the word skyscraper inside of a message. That should not get picked up, because the person is talking about a tall building or maybe Cities: Skylines or something.
We have considered using external sentiment analysis APIs in the past, but those are often expensive, slow, or both. We would quickly overwhelm the API's server with requests, even with caching. Thus, we went with the sucky-but-functional method of a keyword list.
One day, I had enough. I decided to rewrite the bot from the ground up. I remade the entire thing, and passed each message through a script that I left wide open. I planned on going with server-side sentiment analysis. Can't be that hard, right? Simple binary text classification, max 4000 characters, easy. I decided to go with TensorFlow. They seemed to have a simple enough tutorial on text classification, even though it was in Python and the bot was in Node. They seemed to have a way to convert from TF.py to TF.js, so that was good.
First things first, set up hardware acceleration. I use the 2021 base model MacBook Pro 14", which should feature AI hardware acceleration, at least according to this page from Apple.
It took me ages just to get that much working. I used acceleration for one session, and didn't bother setting it back up next time. To Apple's credit, the CPU on my laptop is so good that it's fast enough without GPU acceleration for me to work.
...if only TensorFlow worked. It seems like they have a very very highly curated front page and nothing else. It's like almost-abandonware on the brink of collapse. It was like when you find the one rare page from Google or Apple that's so old that it hasn't been updated since 2008, so lost that even the company doesn't know it exists. Something like the Apple Mailing Lists or the Twitter button in Google Blogger.
I was perfectly able to follow the Python tutorial for binary text classification on the IMDb dataset. It was as simple as copy and paste. At the end, they mention the ability to save models. This really caught my eye. If I could load the pretrained weights, that would be great! I wouldn't need to sit through training every time I wanted a new string in the script.
Exporting the model to a file was easy. That was as simple as export_model.save('saved_model.keras')
. The problem was loading the saved model. It took me days to figure this out, with tons on unexplainable errors along the way. Here it is:
@tf.keras.saving.register_keras_serializable("custom_standardization")
def custom_standardization(input_data):
lowercase = tf.strings.lower(input_data)
stripped_html = tf.strings.regex_replace(lowercase, '<br />', ' ')
return tf.strings.regex_replace(stripped_html, '[%s]' % re.escape(string.punctuation), '')
new_model = tf.keras.models.load_model('saved_model.keras')
That block of code is what I needed to load the model from the file. From there, you can new_model.predict()
to your heart's content.
Anyway what I needed now was to convert this to TensorFlow.js. Given the week of little accurate documentation I had dealt with already, I was worried about this going smoothly. I was right to be worried. As I type this, I still haven't managed to do it.
Right off the bat, I tried their "easy converter".
Okay... maybe I'll use Windows with an x86 CPU...
hmmm... tensorflow_decision_forests
wasn't found. Lemme look up that error.
How is it physically possible to make a program not work with Windows? I have no idea. The fact that this "easy" converter program was paraded around as some solution when nobody bothered to port it to Windows is insane.
Well here we go again, another day another platform.
And it finally worked!
...
...
in my fucking dreams.
Online posts indicate that this is supposed to mean that my file was corrupted, but I know for sure that it's not. At this point, I gave up on the stupid tool and tried a different route. Time for the recommended "alternative".
I guess I have to retrain the model and export it as the end to file, just like before. I'm sure this will go flawlessly.
Fun reminder: I am on my Windows machine. There is no hardware acceleration here, and the CPU is terribly optimized. This is running in WSL, so Linux can't even access all the CPU instructions and needs to emulate some in software. GPU acceleration is out of the question.
It took an hour to retrain the model, 12 epochs at 5 mins per epoch. Error at the very end.
It is 3am and I just had a realization. I wonder if I could load the model with the script I put above and debug the export far faster with that. No clue if that would work, but it's an idea I'll have to get to later. If I don't post an update, assume it failed.
Really, my biggest issue is that there is absolutely no documentation for anything, randomly closed GitHub issues, and more. The "beginner" guides explain very little, and it's honestly sad to see the state of it all. I came for basic binary text classification, and it really should not be this hard.
- I could, in theory, rewrite the bot AGAIN in python, which is slow and bad.
- I could also have the python act as a web server and return the classification as a request, all sent over localhost for optimized speed or something.
- I could just switch away from tensor entirely.
I am really looking at number 3 as my decision right now, but we'll see where this goes.
Goodnight everyone, and expect a blog post about ZIP compression on Windows in the future.