[MUSIC] Hello everyone. Welcome to Big Data and Language. Today, we're going to talk about lemma. Have you ever heard about lemma? If you don't have, what that means? Don't worry about it. I'm going to explain one by one. Okay, so let's get started. Lemma is a form of a word that appears as an entry in a dictionary, and is used to represent all of their possible forms. So, this one is a definition from the dictionary Cambridge Dictionary. And this lemma is used in morphology or lexicography. Okay. So, lemma, it might be easier for you to understand the definition of lemma when you have any word that you want to find the meaning in the dictionary, right? For example, you want to find the meaning of ears, then instead of ears you might want to find the word ear, right? So, ear is the lemma. Okay? So, let's look at the other terminologies one by one. The first one is lexeme. Lexime is the set of all the forms that have the same meaning. So, for example, let's say produce. So, then produces or produced, right? Those are all the lexemes. And the lemmatization is the process of determining the lemma for a given word. So, for example make produce, as the produce, or produced as a produce. So, these processes we called lemmatizations. Okay, so, let me give you more examples. So, for example, if you look at see, saw, sees, seeing, those verbs we can categorize as the lemma see, okay? And the other example is as I mentioned that, write, wrote, written, and writing. Those kind of lexemes we can categorize as the lemma write. Okay? And produce I already explained about that, produce, producing, and produced, this one we can categorize as the lemma produce. Okay? So, now you might have the clear idea of the lemma. And now let's move to the other terminology. Let's compare lemma and stem. The stem is the part of the word that never changes, even when morphologically inflected. So, let me give you an example. For example, produce, producing, and product. They look different, but slightly, they are similar as well, because of the lemma is produce. However, the stem is produc, which means P-R-O-D-U-C. So, we can define that what the lemma is produce and the stem is P-R-O-D-U-C, okay? Which never changes in any kind of morphological inflection. Okay? So, is it clear? Now, let's move on. So, let's think about that why the lemma is important. Lemma or stem are used often in corpus linguistics for determining word frequency. So, this one, lemma, is more representative, compared to word count. For example, depending on your research question, you might want to check the word frequencies of produce, producing, or produced. However, you might want to look at the lemma produce. Okay? So, that one is more representative in general research question. Then let's think about that, the other example. For example, clearing, cleared, or clears, the lemma is clear with the tag noun. Okay? And clearly or clearer lemma is a clear with the tag is adjective. So, clearer or clear, this one we called the lemma is clear, and the tag is adjective. So, depending on the lemma we have the different, even though it looks the same. But in the sentences we have different part of speech. Okay, let me give you then how you can lemmatize your sentences or your text data. For example, you can download the AntConc lemma list. And then you can see that in the alphabetical order, such as is an, right? So, or abandoned, abandons, abandoning, abandoned, this one categorize as the lemma abandon. So, you will get the lemma list. And so, once you download the lemma list, then you load the lemma list to AntConc. This one, I will show you the screenshot, so you can check whether you can get, where you need to click, and where you can load the lemma list to AntConc, okay? If you want to play with that, you might want to stop here and download AntCont and also download the lemma list, and you might want to try, okay? If you're not, then let's keep going on. So, apply the lemma list on Word List Processing, right? So, once you downloaded the lemma list and loaded the lemma list, then on the word list, you can see the lemma. And let me give you another lemmatizers, which means another tools that you lemmatize all the words. So, NLTK WordNetLemmatizer, you can use this tool. And this one, specify the tags, and different tags result in different lemmas. Let me give you an example. So, saw is see as a verb, and saw even though that looks same, but in the sentences, sometimes saw is used as a noun. Okay? Totally the different meanings. Okay? Now, let me give you another tool for the lemmatizer. The other one is NLPStanford LemmaProcessor. So, you can use this tool as well, using NLP Stanford corpus containing more than 70 languages. So, if you are interested in not only English but also other languages, you might want to use this lemmatizer. Okay? So, for example word Barack, then you can use to lemmatize the lemma this NLPStanford LemmaProcessor will categorize this word as Barack lemma Barack. And word Obama lemma Obama. So, Barack Obama was born in Hawaii. If you have this sentence, then it could be the lemmatize, such as Barack Obama be bear in Hawaii. Because was is lemmatized as be, and also born is lemmatized as the original form bear. And Hawaii is still Hawaii, and in is in. Okay. Let me stop here. You might haven't heard about lemma before, but I hope this lecture is helpful for you to understand what lemma is and what kind of tools that you can use. So, depending on your research question, it is very important to lemmatization of your text data. Okay? Thank you for your attention.