Note: This is a work in progress, consider this post more like some development notes of sources and links and references that I've found interesting. I might throw away half of it, or never finish the experiment.
What do ctypes, OpenCV-python and Pillow have in common, apart from being Python libraries? In my case they form a nice pack to begin tinkering with automation in less usual scenarios... like videogames.
After finding an amazing tutorial on how to teach an AI to drive cars in GTA V with Python, I thought I could start maybe with something simpler. Like an auto-clicker for those idle & clicker videogames.
I am using Pillow because its ImageGrab module is perfect to capture screenshots (with bounding box directly, so no need to crop anything), and it's a single line of code to transform the image into an optimized NumPy array... which is also the source for many OpenCV operations.
Regarding OpenCV, I'm just learning it as I go, half by reading its tutorials and half by directly playing with things I've seen in the GTA V AI article, like changing colorspaces and canny edge detection. I am still toying with which approach is better (grayscale or HSV), but I already have a working prototype that draws edges of coins as a black & white mask, and then by reading pixel data I'm able to pinpoint them in screen.
Handling input is becoming actually the hardest part, as I want to build it for Linux (there are lots of articles on
ctypes and Windows DirectInput) and I'm not familiar with the X Window System. Thankfully the X11 library documentation contains all calls and I already have implemented querying the mouse position (XQueryPointer) and moving it to wherever I want (XWarpPointer), so I see where it detects entities better than just raw X,Y coordinates.
I am missing how to send mouse clicks, and I'd like to play with OpenCV a bit more to try to really detect items within it (right now is raw array comparisons).
I don't know if to publish the code, partly to not ease cheating on those games (which in the end you play for free) and party because excepting the input handling code, everything else are really basic operations with each library that you can find on their respective official documentation or tutorials.
In any case, here's one screenshot of a debug frame mask with edge detection. The game was running and already past the equivalent frame, but the goal is to automatically click to pick up all gold coins enemies leave upon dying: