AI: The Good and The Bad

As a data scientist, AI has always been a part of my work. This is true in more ways than one. I create machine learning models from scratch for solving interesting problems and I use pretrained models to help me in the work that I’m doing. The automation and problem solving capabilities that AI provides are amazingly beneficial and at times it seems like there are never-ending possibilities. However, it’s not perfect. It can can make mistakes and when you look closer at what it can do, there are limitations. I feel that it’s good to be excited about AI, as long as we do so with a healthy dose of skepticism.

The models I train and use for predictive analytics at my work are heavily focused on natural language processing. The issue with text data is computers don’t understand text, so we have to turn the text into numbers so that they can handle the information easily. There are a few ways of doing this, but the one that is used most often is by feeding the text data into a large language model like ChatGPT. A large language model that I frequently use is called BERT. BERT was one of the first LLMs, and it has many available variations for download on Hugging Face. It’s fairly simple to use. You import a Python library, such as sentence-transformers, that can load the model and then you encode your text data with the loaded model. The model encodes the text data into a bunch of fixed length vectors that contain numbers. These vectors can then be used in more basic machine learning models like a logistic regression model or a random forest model. The basic machine learning models take the vector as an input and they output a prediction. A more concrete example would be taking a bunch of tweets, which are text data, and feeding them into BERT. For each tweet, BERT will output a vector of fixed length. Those vectors are then fed into a logistic regression model that we train to predict whether the tweet has a happy tone or a sad tone. This is called sentiment analysis. This might sound like magic, and it times it feels like it! But once you learn a bit of the math behind how the models work it becomes less intimidating. Training the more basic machine learning models that use the text data is both a science and an art. You will never have 100% accuracy with your predictions, and so they can’t be used in problems that require 100% accuracy. This is just the nature of AI, and working with those limitations can be tricky. It’s not a solution that will work for every problem.

I’ve used AI to help me develop and become a better programmer as well. I was recently working on a web-based application for logging data and storing files. The application itself was broken into 2 parts, the file storage part and the data logging part. They each had their own Docker containers that needed to communicate. I deployed both Docker containers but they couldn’t communicate with each other even though they were hosted on the same server. So I did what every good programmer was doing these days and asked ChatGPT how to fix the issue. It walked me through the steps of configuring Docker’s bridge network so that the two containers could see each other and I could send API requests between them. ChatGPT provided me with links to documentation that helped me do a deep diver after, and I became more familiar with how networking works in the Docker ecosystem. This is just one example of many where ChatGPT has helped me with quick solutions to problems I’m struggling on. However, it’s not always perfect. There are times where it provides me with invalid answers to development problems that I’ve already seen by searching around online, so nothing it provides is of help. Ultimately this is its biggest limitation, it can only work using the data it has been trained with.

In summary, AI is great when you need to find solutions for problems that don’t need 100% accuracy or if you’re looking for a quick solution to a simple problem that has been documented a lot already on the internet. Even then, it has been known to make up answers from time to time. So it is a good starting point, but you should always be willing to take its answers and do your own research with them.

Comments

Leave a Reply Cancel reply