This session is about quantifying the tone of an article. You'll learn a few techniques to measure the sentiment of text extracted from the media. We'll try to figure out whether a given text is optimistic or pessimistic, or anywhere in between, and we'll actually motivate that by showing real research examples. The first approach is heavily derived from psychology. In fact, the researchers use the Harvard psychological dictionary which lists 77 categories of states of mind: Negative state of mind, state of mind of strength, passive state of mind, and positive state of mind. Each one of these categories is then associated with different terms that suggest that state of mind. Negative terms would be things like disaster and awful. Positive term would be things like good and beautiful, and therefore, we have a mapping of words into state of mind. What could we do with that? By counting the number of time each terms from each list appear in a given text, we can say something about the tone of that text. If there is a lot of words associated with negative state of mind in an article, that may suggest that the article is actually negative. But there are a lot of positive words in an article, it could be suggesting that the article is positive. So there are a few ways to do it. Let's just do the simple thing. A sentiment of Article K could be estimated by counting the number of positive terms in the article, subtracting the number of negative term in the same article, and divide it by the sum of positive and negative terms. So we have a number, now a sentiment number, which is in the range of minus one to plus one, and we can say something about the positiveness or negativeness of an article. But is it meaningful? Does it do the job? The first paper that looks at that measure was Tetlock in 2007. He actually looked back to 1984, and every day look at one particular column in The Wall Street Journal, calculate the sentiment measure that day and tested whether that sentiment measure predict return over the next day. The results were mixed. Well, negative terms were able to predict next day return significantly, but the positive terms did not add any value. So what could be the problem in that approach? Now, they could be many. But one of the problem could be that because the dictionary is derived from psychology, it actually misses the context of the financial markets. I'll give you an example. Liability is associated with very negative state of mind. If people feel liable, it means something about their state of mind. They may be depressed. However, in the context of the financial market, liability is just a term. It's actually neutral. Every financial statement will have an account name liability next to assets. Therefore, the list of terms used from psychology needs to be adapted to take into account the financial market context. In fact, the following research by two researchers, McDonald and Loughran, updated the list of terms from the first research that we show and apply financial market context to this list. In fact, the results were slightly better. There was a big improvement in terms of negative words predicting negative returns compared to the first paper. However, the positive terms still were not predictive of positive returns. A recent modification I have, however, proposed by two other researchers, Jagadeesh and Wu, improved the sentiment measure by assigning different weights towards based on the market reaction to article using these words. Historically, the US regression model to estimate the weights and then adjust sentiment calculations accordingly. So think about words such as fraud and disaster that may have very high impact on return, compared to maybe other negative words such as myths that may not have such a big impact. The way the approach works is that the vector of words is regressed on market return and the regression results assign weights to each one of these terms. So if fraud indeed historically had high negative impact, then its weight is going to be adjusted upward compared to other words that may not have been historically associated with negative return, which the weight would be diminished as a result of the regression. So the question is, does it work better? In fact, they show that both the negative prediction using negative terms, but now weighted differently improves. But also what we have not seen before, that positive terms using this approach are also predictive now of positive returns in the future.