{"id":20,"date":"2023-02-09T07:48:33","date_gmt":"2023-02-09T07:48:33","guid":{"rendered":"https:\/\/blogs.oregonstate.edu\/kendevoe\/?p=20"},"modified":"2023-02-09T07:48:33","modified_gmt":"2023-02-09T07:48:33","slug":"transformers-who","status":"publish","type":"post","link":"https:\/\/blogs.oregonstate.edu\/kendevoe\/2023\/02\/09\/transformers-who\/","title":{"rendered":"Transformers who?"},"content":{"rendered":"\n<p>Transformers Who?<\/p>\n\n\n\n<p>This last week has seen what used to be a subdued research battle between top tech companies like Google and Microsoft go public in a big way. By now everyone and their cousin have heard of <a href=\"https:\/\/openai.com\/blog\/chatgpt\/\">ChatGPT<\/a> and how it will change everything. We\u2019ve seen launch presentations from Microsoft and Google about how they are building and using the best of AI for search. The outcomes of these new technologies are incredibly important for these companies, highlighted by the 100 billion dollar drop in Alphabets value from Google\u2019s Bard chatbot giving a <a href=\"https:\/\/www.forbes.com\/sites\/jonathanponciano\/2023\/02\/08\/alphabet-stock-loses-100-billion-after-new-ai-chatbot-seemingly-gives-wrong-answer-in-ad\/\">wrong answer<\/a> in their demo\u2026 With this backdrop, I found one portion of Google\u2019s presentation interesting. They splashed a 2017 research paper on the screen, claiming basically that \u2018we were the ones that revolutionized this whole AI thing\u2019. What was that paper and what is the \u2018Transformers\u2019 they talked about?<\/p>\n\n\n\n<p>To find out we are going to need to go on a tour of the greatest hits of neural network architectures. I know, can\u2019t wait right? I promise it\u2019s not as bad as it sounds\u2026 We have four we need to get through and I\u2019ll go quick:<\/p>\n\n\n\n<ol class=\"wp-block-list\" type=\"1\">\n<li>Basic Neural Networks<\/li>\n\n\n\n<li>Convolutional Neural Networks (CNNs)<\/li>\n\n\n\n<li>Recurrent Neural Networks (RNNs)<\/li>\n\n\n\n<li>Transformers!<\/li>\n<\/ol>\n\n\n\n<p>Let\u2019s start with a basic neural net. This is likely the picture you\u2019ve seen if you have found yourself late night searching YouTube for \u201cwhat is a neural network anyway?\u201d. It\u2019s defined by basic input nodes all connected to output nodes with some layers in between for fun. Not altogether too much different from a subway sandwich.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"274\" height=\"364\" src=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/6431\/files\/2023\/02\/NN.png\" alt=\"\" class=\"wp-image-21\" srcset=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/6431\/files\/2023\/02\/NN.png 274w, https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/6431\/files\/2023\/02\/NN-226x300.png 226w\" sizes=\"auto, (max-width: 274px) 100vw, 274px\" \/><figcaption class=\"wp-element-caption\">Your basic everyday neural network. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Neural_network\">https:\/\/en.wikipedia.org\/wiki\/Neural_network<\/a>\u00a0 <\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"344\" height=\"344\" src=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/6431\/files\/2023\/02\/Sandwich.jpg\" alt=\"\" class=\"wp-image-22\" srcset=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/6431\/files\/2023\/02\/Sandwich.jpg 344w, https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/6431\/files\/2023\/02\/Sandwich-300x300.jpg 300w, https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/6431\/files\/2023\/02\/Sandwich-150x150.jpg 150w\" sizes=\"auto, (max-width: 344px) 100vw, 344px\" \/><figcaption class=\"wp-element-caption\">Sandwich for reference. <a href=\"https:\/\/www.subway.com\/en-us\/\">https:\/\/www.subway.com\/en-us\/<\/a><\/figcaption><\/figure>\n\n\n\n<p>Ok one down. How about CNN\u2019s? Well those are similar to our basic footlong sandwich neural network, but the input is handled a little differently. Instead of naively connecting all the inputs in equally to the middle sandwich layers, these networks instead more intelligently look at groups of inputs. Computer vision is a common area for these. Instead of just reading the value of each pixel independently, this network considers a patch of pixels all at once, like how you focus on one particular area of a picture. So CNN\u2019s are great where groups of input has meaning, like an area of a picture.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"760\" height=\"409\" src=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/6431\/files\/2023\/02\/CNN.jpg\" alt=\"\" class=\"wp-image-23\" srcset=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/6431\/files\/2023\/02\/CNN.jpg 760w, https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/6431\/files\/2023\/02\/CNN-300x161.jpg 300w\" sizes=\"auto, (max-width: 760px) 100vw, 760px\" \/><figcaption class=\"wp-element-caption\">Believe it or not a basic CNN. https:\/\/towardsdatascience.com\/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 <\/figcaption><\/figure>\n\n\n\n<p><a href=\"https:\/\/towardsdatascience.com\/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53\">https:\/\/towardsdatascience.com\/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53<\/a><\/p>\n\n\n\n<p>Over halfway now. RNN\u2019s are all about their first word \u201cRecurrence\u201d. An RNN is basically one layer that continually calls itself, feeding its outputs back to its inputs. Why do this? Well, it turns out this is a great strategy for any data in a time series. For example stock prices, the previous value goes into the network layer, and the output is the next predicted stock price. Just rinse and repeat for the series. Or with text, the last word is the input, and the next word is the output. The great thing about these networks is they are easily expandable like a slinky. Short text? 6 inch NN sandwich. Long text? Foot long NN sandwich. You get the idea, or maybe this is not making any sense but I commend you for reading this far anyway.<\/p>\n\n\n\n<p>RNN Reference: <a href=\"http:\/\/karpathy.github.io\/2015\/05\/21\/rnn-effectiveness\/\">http:\/\/karpathy.github.io\/2015\/05\/21\/rnn-effectiveness\/<\/a><\/p>\n\n\n\n<p>And finally\u2026 Transformers! As Google was keen to point out, these were developed by Google in 2017 and they\u2019ve taken over as the neural network architecture of choice for cutting edge AI in many different areas. So what are they? Well Transformers rely on a basic mechanism called \u2018Attention\u2019. Details can be found in the aptly named \u2018<a href=\"https:\/\/arxiv.org\/abs\/1706.03762\">Attention is All You Need<\/a>\u2019 research article. (Quick aside, InceptionNet gets my gold star award for the best named neural network, from the paper <a href=\"https:\/\/arxiv.org\/pdf\/1409.4842.pdf\">Going Deeper with Convolutions<\/a> also by Google) So what attention does is instead of looking at portions of the input, or reading the input recurrently word by word, it just looks at the whole thing, it reads in the whole input all at once. I know what you\u2019re thinking, isn\u2019t that where we started with the plain subway sandwich network?<\/p>\n\n\n\n<p>Yes, you are absolutely correct. Except one thing the attention mechanism does is learn where to focus its \u2018attention\u2019 to within that input. Some of its internal sandwich layers are dedicated to learning which parts of the input are important for collecting certain types of information. This is somewhat similar to how a CNN will look at certain areas of in image at a time. Except transformers looks at the whole thing and dynamically learn what to focus on within that image or string of text. Basically as a transformer network trains it gets better and better at recognizing what types of input are coming in and how much emphasis to place on a given input. In practice you get many of these transformers working together, each trying to answer a specific question about the input. For example one might be looking at what the topic of a sentence is, while another is looking at what is the event that is taking place. All of this results in a very cool sounding Multi-Headed Transformer Neural Network.<\/p>\n\n\n\n<p>That\u2019s it. Sorry I lied about the whole quick thing but we had a lot of ground to cover. Long story short the basic architectures of neural networks continue to evolve. Transformers take the current title of the latest and greatest, until the next sandwich comes along.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Transformers Who? This last week has seen what used to be a subdued research battle between top tech companies like Google and Microsoft go public in a big way. By now everyone and their cousin have heard of ChatGPT and how it will change everything. We\u2019ve seen launch presentations from Microsoft and Google about how [&hellip;]<\/p>\n","protected":false},"author":13113,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-20","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blogs.oregonstate.edu\/kendevoe\/wp-json\/wp\/v2\/posts\/20","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.oregonstate.edu\/kendevoe\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.oregonstate.edu\/kendevoe\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.oregonstate.edu\/kendevoe\/wp-json\/wp\/v2\/users\/13113"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.oregonstate.edu\/kendevoe\/wp-json\/wp\/v2\/comments?post=20"}],"version-history":[{"count":1,"href":"https:\/\/blogs.oregonstate.edu\/kendevoe\/wp-json\/wp\/v2\/posts\/20\/revisions"}],"predecessor-version":[{"id":25,"href":"https:\/\/blogs.oregonstate.edu\/kendevoe\/wp-json\/wp\/v2\/posts\/20\/revisions\/25"}],"wp:attachment":[{"href":"https:\/\/blogs.oregonstate.edu\/kendevoe\/wp-json\/wp\/v2\/media?parent=20"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.oregonstate.edu\/kendevoe\/wp-json\/wp\/v2\/categories?post=20"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.oregonstate.edu\/kendevoe\/wp-json\/wp\/v2\/tags?post=20"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}