Welcome. Today we have with us PhD candidate Barend Beekhuizen, we're very happy to have you. And he is specializing in first language acquisition. In specifically, you use computer models, to find out things about first language acquisition. Barend, can you tell me something about what you do? >> Yes, of course. I'm currently doing my PhD here in Leiden. And I'm working on child language acquisition and how to study that with computational models. >> Well I know nothing about this. Could you explain me some basic things about what you're doing? >> Yeah, sure. So what I do is look at first language acquisition. How children learn the grammar of their mother tongue. And that's, you know, as we all know, when children are one and a half year olds, they don't produce sentences like "mother would you please hand me the ball." They would say "gimme ball" or something like that, right? So they produce these very short limited sentences. And I'm trying to understand why they do so, because in fact, we don't really know. There are different explanations. Some people say it's incomplete knowledge of the grammar. Other people say it's memory constraints. There are various explanations. So what I'm trying to do is build a computer model that gets the language as the child would, it gets the utterances an adult would produce. And try to extract all sort of regularities and rules from that in a very constrained way. So it's trying, it doesn't count, process everything and calculate everything at the same time. It will gradually build up its knowledge of the language. And in that, by doing so, it tries to, the model simulates what children do when they learn a language. So, it mimics or simulates the processes that we find in a child. >> Yeah. >> The reason we're doing so, of course, is that we would like to understand how a child, how a child's brain works, right? >> Mm-hm. >> So, we're trying to simulate that by building a piece of software that simulates it. And thereby, understand what's going on. What are the mechanisms, what are the processes? >> So the simulator part is like a flight simulator, but then you have a child simulator. >> In a way it is, yes, definitely, yeah, you get the same. As in a flight simulator, you have to simplify. Because in the flight simulator, of course, you are, you know, you have the winds that are-. >> Yeah. >> Hitting your plane and there are certain, I don't know what's happening in a flight simulator. >> [LAUGH]. >> But there are different sort of things, but you have to simplify it. It's not the real world. >> True. >> Same for a child simulator, you have all sorts of things that are happening in the real world, but are not in my computer model. I'm just simplifying, taking some things on board, other things not. And then, trying to see if it does the same thing as a child would. >> Okay, and what kind of input information do you give to the computer model. Is it like sound, speech or-. >> Mm-hm. >> Written language or-. >> Well so it is spoken language, but it is transcribed spoken language. >> Okay. >> So there's a lot of data on how parents talk to their children, and people have been recording this for decades now. And these recordings have been transcribed. >> Okay. >> So we've got this transcriptions of those caregiver child interactions. And then, you just have the utterances, right? So you just get sentences like, "Oh, will you get the ball for me?" "Now, put it in there," etc. And, of course, we don't get the meaning or the situations with that. So what we're trying to do is basically, find out what realistically would be there in a situation in which the parent says, put it in there. What kind of objects and situations are taking place at the same moment. >> So you sort of attach meaning into the model, and you sort of stick it onto the speech. >> Exactly. >> In the transcription. >> Onto the transcribed speech, yeah. So of course what I'm doing is, so other people are, for instance working how children are extracting what sounds their language have, the phonology of their language. I don't do that, so I, I take that for granted in a way. I assume that the child already knows, more or less, what the word boundaries are and what the different phonemes of the language are. So I just look at whole words and how they are processed in a context of a situation. And what kind of rules govern the combinations of words, for instance, how your grammar works in a language. Yeah. >> So you're specifically looking at first language acquisition and how a child learns that. But it seems to me that, and I assume that you do that, because we don't know that yet, how that works. But it seems to me that we also don't know how that works for adult language, for our own language. For example when you look at Google Translate, that makes, to me, a lot of mistakes. So, how does that relate to each other, why does it makes so many mistakes? And how can we solve that problem? >> [LAUGH]. >> Oh yeah, you're absolutely right that we also know, precious little about how it works in adults. And definitely that's also one of the reasons why Google Translate isn't perfect. I mean, Google Translate is a great tool, it works for what it's supposed to do quite well. Giving you kind of word for word translations from one language to the next. And it does so quite well. And I believe the reason is why it makes those mistakes is if you really want have an accurate computer-driven translation from one language to another language. Is that you need a full grammatical model of each of the languages. >> Mm-hm. >> Plus a mapping between each of those grammatical rules in one of the language and another language. So that's, that's a massive amount of information you need. Google Translate basically takes a stupid approach. >> [LAUGH] >> Saying, okay, we have parallel text. For instance >> The Bible? >> The Bible. >> For example. >> Transactions of the European Parliament. >> Mm-hm. >> And then it says okay, we have this sentence in this language and that sentence in that language. Let's see what words or groups of words map together between those languages over different items, over different sentences. So which words occur often enough together so we can assume they are translations of each other. That's basically the take that Google Translate has on translation. So that's very simple. It ignores the rules of the language, but still it does quite well. So that's an interesting given that we don't need that much rules to get a good enough, quite decent translation of one language to another. >> So Google Translate uses words, really, and not the meanings. >> Yeah. >> And what you try to do is you try to use the meanings. >> Exactly. Yeah. So Google Translate basically takes the strings of words in language to strings in other languages and that assumes that there is nothing in between. But if, and, in that sense, Google Translate is of course a commercial model, you know? It's a product, a tool, and it doesn't try to model how language works in our minds. >> Mm-hm. >> So if you want to know how language works in the mind, you simply cannot avoid meaning. Meaning is a crucial part of that. And that's definitely what my model is, at this moment, what I'm trying to do with my model at this moment, and what I'm planing to do in the future as well. >> Yeah, because what exactly are you planning to do in the future with the model? >> So, I've been looking at, in my current, in my PhD project I've been looking at the acquisition of grammar, >> Mm-hm. >> So how the words for combining rules into larger wholes are being acquired by children throughout the acquisition process. And in my future research I would like to focus more on the acquisition of word meanings because I haven't been able to study them in as much detail as I'd like to do. But it's a huge problem if you think about the myriad ways in which languages cut up the meaning space. >> The meanings of words. >> Right, so they can, they can categorize things very differently from one to another. >> Could you maybe give an example of this? >> Right. So what I've been studying last year is basically how children learn the meanings of Dutch prepositions. Dutch prepositions are very similar to English prepositions, so English has on and in, right? On meaning that there is surface support, something is being supported by another surface and in means containment. Something is you know, being contained by something else. Now Dutch has, and we know that from research by Melissa Bowerman. Dutch has the preposition aan which is very strange. aan means that basically there is some sort of tenuous contact. Like a coat is on a hook would be translated in Dutch as aan de haak. Or the painting is on the wall, het schilderij aan de muur. We don't use op like English on would. So in English, those relations, those tenuous contact relations are grouped with on. In Dutch they are not, they have a separate preposition. Now this separate preposition, aan, is a very unique one, typologically speaking. So in the context of the world's languages, most languages, either group this tenuous support relation ith either the on-like prepositions, the support ones, or with the in-like prepositions, the, the containment ones, right? >> Mm-hm. >> So Dutch does things differently. And the idea was, Melissa Bowerman's idea was that Dutch children have problems with the acquisition of aan. They make errors when learning aan. Because of its unnaturalness. Its cognitive unnaturalness. And we know- >> Yeah, so it's something you have to learn, right? >> Right, it's really something you have to pick up from your language. Your language makes this particular categorization, category distinction and other languages don't. >> And do you then also see that they acquire it later and that they make mistakes. >> Yeah. >> With it. >> Right, exactly. So what we did in our modeling work is basically we take a lot of languages speakers of languages and have them describe different situations. Cups on tables, coats on hook, paintings on walls, etc. >> Mm-hm. >> And then we do some, we do you know, on that big matrix of data, big table of data. We use statistical techniques to get, kind of the, the core dimensions out of it. What are the most dimensions in which languages vary the most and vary the least. And using that model we find that certain situations are closer to each other if they are being expressed more often with the same adposition in a different language. So the cup on the table, and for instance, the pen on the table would be characterized by the same adposition in most languages. The coffee in the cup and the fish in the fishbowl also same adpositions. But a coat on the hook versus the cup on the table, there's a lot of differences there. So what we find then is that situations with the coat on the hook for instance is somewhere halfway between the in and on like situations, right? So if you train then a computational model that learns, in that space, what words mean. So it says, basically, I draw a circle around these cases that will be op in Dutch or on in English. If it tries to do so, it will initially make the error that will group a lot of cases where we as adult speakers of Dutch would have to use aan, it will group them with op cases, instead of making a separate preposition, it makes errors there. And indeed as you said, the model also acquires the meaning of aan later simply than to op. >> That's really interesting. >> Yeah. >> So that's what we found with the computational model. My, I mean, the findings are old. We all ready knew that from Melissa Bowerman's work, but we modeled that, did that. >> Yeah. >> Yeah. >> Okay. >> Okay, so we would like, of course, to wish you a lot of success with that. We were very happy to have PhD candidate, Barend Beekhuizen with us today. And, he has a final question for you. >> So in a question you'll see four pictures of different situations. And my question to you would be, how would you describe these situations in your language?