So I had my brother-in-law visiting for the past couple weeks, and he’s young and excited about game making, so I did what everyone does. started a new project with him to work on.
Just a godot project, don’t really have much on it to be honest, but maybe it’ll get there, that or it’ll just sit as another unfinished project.
Him being over distracted me from most of my other projects.
Engine progress has been pretty quiet.
I do however have quite the development at work, so I’m going to need to step up my note taking and organizing game. I prefer to handwrite my notes on paper, since that helps me both to remember what I wrote, and let’s me take notes off to the side while I’m presenting, sharing, or typing something else on my computer.
This does mean a couple things though:
- My notes aren’t easy to search or sort
- my handwriting is horrendous, so being able to read my own writing is sometimes difficult, or nigh impossible
- if I want to convert it to text, then most standard out of box OCR things I would expect to not work very well.
I did look into something I’d heard of a bit called Rocketbook, as this looks to be the least expensive type of “smart notebook” starting at only about $34, and you get a notebook with sheets that can be erased with water as long as you use the special pens.
The issue, is their app’s OCR involves sending the scans to their servers, which in my mind is kind of a no-go. I want something that I know doesn’t create any type of security/privacy issue with what I’m working on, and will let me potentially use the solution for both work and for non-work things.
So basically I started a new project to train some ML models to do OCR on my handwriting. I took a intro to machine learning/AI class a long long time ago, and I’ve dabbled about with tensorflow in a past so it’s not all super new to me, but OCR is a somewhat complex topic and has multiple moving parts, so I’m following along with a couple of tutorials.
The gist of it is that you have to do two things (typically with two different models)
- Detect where the text is
- this usually spits out a bounding rectangle of each chunk of text in the image so that it can send a cropped image of just the text to the second model
- recognize what each chunk of text actually is
- this is what does the actual handwriting
There are actually pre-trained models you can find to do both steps, in fact there’s already a project using said models on Android to do the recognition completely on a phone, however those models are trained on computer printed/generated text, not handwriting, so I doubt they will be very accurate.
So my plan is to basically find models to do those things, and then create some sample writing data myself, so that I can train the models on my own writing instead of someone else’s. This is good because sometimes my words turn into squiggles when going quickly, so if I can write crap out, and then type it out so I know what it says, then feed it to the model, in theory I should be able to get some semblance of a working OCR model off my own handwriting.
I would need to likely find and train a detection model, and a recognition model. I plan to do that on desktop using python + jupyter lab (an in browser python editor/tool which is fairly nifty for data processing and can display matplotlib graphs and charts inline with the code blocks)
After that I need to convert each model to a tensorflow lite model that can be loaded by an Android app to run the models on a phone, so I could basically make an app to take a picture of a piece of paper or notebook page, and then turn it all into text.
I also haven’t decided what should happen for the images, I do sometimes have diagrams, and formatting is also a fun challenge, so I may have the app take the images and save them separate from the text, and/or I might have the images get put into a PDF along with the OCRed text data.
For now I’m following along with this tutorial. it seems fairly in depth on how to do things with explaining the code as you go, and is an easy format to just copy/paste into a jupyter notebook. This is just the handwriting recognition portion, and not the detection portion though, so I will likely need to figure out how to locate the text in the picture/page with a different model.
I’ve been eyeing a model called EAST for that, it seems more aimed at detecting text in the world and not on a page, but really I think that will just improve the robustness of the model for what I want.
So I guess it’s off to collecting a few thousand word/writing samples from myself to train this puppy on, tune in next time to see how long this project lasted!