My Generative AI and ML usages (as of 2023)

As I was writing my 2023 wrap-up post, I noticed I have scattered around different posts some of the generative AI and machine learning scenarios that I'm actively using (the so-called "AI" term). It is a good idea to summarize them because my general advice from 2023 echoes what has been said a lot during the year: You should experiment and test with AI technologies.

I will focus only on technologies/APIs, but an increasing number of tools are "AI-Assisted" now, from Spotify's radio DJ to Grammarly/Language Tool.

Github Copilot

I use it as an auto-complete and smart linting helper. It allows me to switch between languages almost on the fly: JS/TS, Python, Go and Bash scripts, in my case.

Depending on the task, it is either incredibly useful or a terrible mess for more serious code generation: It took me less than 15 minutes to build some React-based animations and SCSS styling changes. On the other hand, even with the tricks of leaving open other tabs with related source files (as context), at times, Copilot struggles to understand what I want to build; even with 3 line long descriptive comments. But it is still worth a try, as it is a decent speed boost for non-complex tasks.

A related task where it is proving excellent is writing tests for your code. It probably depends on the language, but for Javascript, I keep the file containing the class under test open and write the first one or two tests. Then I just write the starting `it("blablabla"`` line, and it auto-completes it with a perfectly valid body based on the test description.

The Github Copilot Chat feature is excellent, as it removes the need to ask GPT to roleplay a developer, etc., and can give back examples directly adapted to your programming language and even your open source code files. Try it if you haven't yet. For technical questions, it became my new "Stack Overflow".

ChatGPT

Learning languages: for English, I'd instead go to specialized sites, but for Swedish, it is a fabulous quick learning tool.

It is my new Google + Wikipedia.Knowing that it might not always be correct, as long as I ask it about topics that I know exist, the information has always been correct (I still fact-check everything, just in case). Or I directly got a "up to my training data of xxxx, I don't know about that" [1].

I also use it a lot for random, trivia-like questions. My "trick" is to think if my question is available somewhere on the public internet in trusted places. I'm confident it will adequately answer all my nature, physics, and similar science questions, but beware of more niche answers. e.g. it was able to very precisely answer the question "at the end of the first Dune book, which of the characters are still alive?". But then, when asked something about the second book, it confused some facts with other volumes and hallucinated.

I've also noticed it is prone to be too gentle, both here and in Copilot-related tasks. If you tell it it was wrong, it will try very hard to please you, often inverting the previous response, even if the new outcome is false/incorrect. This fact and some basic prompting skills are still required to some extent, but I think this is a matter of pure iteration and refinement until those details get solved.

Another use is to "spice up" specific messages: e.g. we hosted an internal hackathon at work, and I used it to "cheer up" all the announcements/updates I had to write. It might tend to overuse emojis, but you can also tell it not to use any. It is the same with making a sentence or paragraph more formal or more technical.

My last usage, often API-driven, is to summarize content: From Youtube videos to complete articles and PDFs, it is excellent at interpreting data, and, via API, you can set its temperature to 0, and then it will be less creative (aka, less prone to hallucinate). The resulting summary is sometimes repetitive and dull, but digesting a 45-minute video into a three-paragraph text is a great time-saving technique.

DALL-E 3

"In the past" (as if it were a decade ago 😉), I played with DALL-E and the first versions of Stable Diffusion. A nice toy, and an incredible helper for specific tasks like assisting in Dungeon Mastering RPG sessions (although Midjourney is probably better for both player and NPC portraits), but it was either too prone to flaws and imperfections or too direct copying artist styles.

But then, Microsoft quickly added DALL-3 to Bing as soon as OpenAI announced it, and wow, the results are way better now. The images have fewer artifacts/errors (but look carefully; they still do at times!), it understands the prompts way better [2], and the quality of the images can be outstanding with fewer attempts.

As an example, with decent but far from advanced prompts, I was able to help a family member design a small poster. It took us 30 minutes of back and forth through an instant messaging app, me generating the images and sending them, her replying with the best one, and other things to tweak/change. The results were quite good; we didn't need to buy any paid clipart or ask for professional help.

I've also used the OpenAI DALL-E 3 API. There, my experience is that the results are prone to flaws (e.g., missing body parts happens a lot), so I would wait to fully automate image generation, placing an intermediate human review and approval step in the workflow for now.

Closing Words

Usefulness aside, which is coming close to many science-fiction books regarding AI assistants in concept [3], what impresses me most is the mathematical and technical complexity. LLMs are a huge multi-dimensional Markov chain (oversimplification!), statistically queried, but the text2image diffusion models feel mindblowing to me; It feels magical that we have devised a way to teach computers to convert image patterns and fragments to series of numbers, and even more incredible if we factor that they learn from shapes ("a cat") and fragments ("with tiger stripes") to colors ("orange") and drawing styles that apply to the whole image ("pixel art", "coloring book"...).

[1] Pseudo-offtopic note: ChatGPT 3.5 is now reporting to have training data until Jan 2022, so they have advanced from the original Sept 2021 mark.

[2] DALL-E 3 overrides and enriches your prompts. Often, it is good, but if you want more fine-grained control, you can force it to not re-write.

[3] Execution is still in the early stages, but I have zero doubt we'll get to a perfectly working speech-driven "user interface." Whether it is a single assistant or multiple ones, I'll leave it to futurologists.

Tags: Development

My Generative AI and ML usages (as of 2023) article, written by Kartones. Published @