Vector Database

My favorite piece of technology that we are using for our project is Vectors/Vector Databases. Vector Databases play a huge role in the AI/ML landscape. Using this technology will help give my team members and I insight into how AI/ML is developed today and we can use this technology to develop projects of our own in the future.

What is a Vector Database?
Vector Databases take your structured/ unstructured data and group them close to data that is related to one another.

How does It work?
Imagine a Vector Databases is like a football field. Each individual piece of data will be positioned somewhere in the football field based on its qualities and traits. The way we determine any piece of datas qualities/ traits are by running it through a vectorization process. We can think of the vectorization process as a black box that gives us football field coordinates determined by the traits of that piece of data. Where this becomes useful, is the vectorization process is able to determine the best coordinate position for each object based on its relationship to the other objects in the field. So for example, a puppy object would likely be close to a Great Dane object since both pieces of data share similar qualities. When we plug in a large amount of data, the vector database becomes an invaluable asset as we can now query the database for objects similar to what we are querying for.

Drawbacks?
Vector Databases are not much use if the amount of data used does not paint a full picture. In order for queries to be accurate and to get the results we want, we need to have a lot of data to work with. The unfortunate side of vector databases, is that our results are only as good as the amount of data the database holds. A sparse football field is going to give not so accurate results. Imagine querying for pictures of pugs and you are returned back ducks. Although both are very cute, no one would consider that a successful query (unless you really love ducks, then you probably would not mind).

How it could be made better?
Vector Databases are a new piece of technology that is being adapted; therefore, there is still a lot of room for improvement. One thing that could be improved about Vectors is the amount of storage taken up for an individual piece of data. Vector Embeddings take up a significant amount of space. So not only do Vector Databases need a lot of data, but each piece of data takes up a decent amount of space, this does not make Vector Databases very cost effective. In order to level the playing field and make this technology more widely adaptable for the typical developer there needs to be work on developing data compression algorithms that are able to further reduce the size of the embedding.

Comments

Leave a Reply Cancel reply