ČT 24: Studio 6, the morning broadcast

On 31 January 2023, we presented the results of our research in the morning broadcast of Czech Television.


Transcript of the interview:

Krásná: You can tell who the real author of a text is by the usually almost invisible keywords, or find out information about him that would otherwise remain hidden. This is what a new technology from a team of linguists from the Faculty of Arts at Palacký University in Olomouc can do, which they patented in the United States. It uses artificial intelligence to search for patterns in texts, helping recruiters in selecting employees, salespeople in targeting advertising or investigators. The authors, Martina Benešová and Dan Faltýnek from the Department of General Linguistics at the Faculty of Arts, Palacký University in Olomouc, will explain exactly how it works and how it differs from other methods. Hello to both of you

Benešová: Hello, hello from Ostrava.

Faltýnek: Hello.

I will ask a question.

Faltýnek: Well, basically, we just need a longer stretch of text to extract the basic features that we’re looking at and then based on these text fingerprints here, we can then, compared to the other authors, somehow identify that particular one with some level of probability that we have trained on some test sample.

Krásná: That longer stretch of text, how many words might that be?

Benešová: This is still in the testing phase, but we are now sure that around six thousand words will be enough, but it turns out, according to our latest analyses, that we will go even lower with that rate.

Krásná: Keywords are also important. Just for an example, what are they typically?

Faltýnek: They can be pronouns, they can be verbs, for some people they are more content words, different themes that they are focusing on, for some people they can be more of a means of textual construction, which they use to build the text and help create sentences or build on them. It depends.

Benešová: Yes, these are simply words that the individual, the author, unconsciously uses to express himself and unconsciously repeats them over long stretches of his textual production. We can’t outright say that it’s a noun or a pronoun, for example. It’s subjective for everyone, and that’s the one about the individual’s print in the text.

Krásná: And does it depend on how often they’re spaced or where they are in the text?

Faltýnek: That’s very significant. We’re just focusing on devices that are very distant, so the author is not fully aware of them and is not fully aware that he’s putting them in the text. And it is precisely these means that we extract from the text, and on the basis of them, we are fairly certain that it is Charles and not Peter who is speaking.

Krásná: The second use is for so-called profiling. You are able to tell some information about an unknown person from the text, so describe that too, what information, for example?

Benešová: Since the author unconsciously uses these words, these key words, in the text, he unconsciously reveals some topics that are weighing on him or that he doesn’t even know about, which has been shown to us in various texts. For example, we have analysed, it is such a popular, although somewhat sad, we have analysed some mass murderers or we have analysed famous authors. It turned out, for example, with these mass murderers, that in the texts they produced before the acts, that the places of the crime, for example, or the themes that weigh them down, that lead them to do it, appeared.

Krásná: I did notice that you can identify maybe an age group. I can imagine that, because I use different words than, say, my eighteen-year-old son. However, can you perhaps also determine the gender?

Faltýnek: That is also possible. In the case of gender and Czech, it’s quite simple, because in Czech we have grammatical gender, but of course gender, age are some of the basic features that are considered necessary in automatic text analysis, and that’s important because the method that we are developing is fully automated. It doesn’t require a human to sit at it, it just gives the data to a particular interpreter in order to extract additional information from it, whether it’s in HR or in investigations or other activities.

Benešová: Anyway, I would add to this that this method is language independent, so although we were talking about Czech, that in Czech the gender is quite easy to determine, but just this method works independently, just by its very nature, independent of language.

Krásná: So is it important what the post is, where it comes from, let’s say, if it’s from social media or if it’s a newspaper article?

Faltýnek: It’s best if it’s natural communication, which is what we like best, because there are the most characters, but we can use basically any written or spoken text, because the author, whether he wants to or not, always leaves his fingerprint in the text.

Benešová: It’s important, if we want to reveal the individual, if we want to identify the profile, the fingerprint in the text, it’s important that it’s a text that was produced by that author, by that individual, and otherwise nothing else matters.

Krásná: You have patented this technology, so the sub-question is whether it has a name and what does this step mean to you that you have patented it?

Faltýnek: It doesn’t have a name directly, but it’s basically a method of personalization of a person to a specific person in digital communication, where we are able to recognize an individual completely separately and we are able to target digital communication to that individual, which current technologies, Google, Adobe and other such players in the market, can’t do, they can’t recognize based on the language production of a specific person, we can do that, including those specific key themes, and we are able to then influence that person, target that person, specifically that content.

Benešová: However, the fact that we have patented it means to us, because this technology is unique, as we have already said, so it means that we are trying to protect it, and as far as the name is concerned, it has a wide application. It’s, as you said HR, then it’s protection, the area of national security protection and so on, so these individual directions of ours, these applications, like they have names already, like Deep sense and Deep projector and so on, but like that technology, we’ll think about it and we’ll definitely call it something.

Krásná: And the reactions to this patent, maybe from abroad, have you seen or from people who are involved in this issue that they would be interested in this technology of yours?

Faltýnek: So far, only the US Navy has contacted us, thanks to the rector of Palacký University, Martin Procházek, and we are gradually negotiating with other companies and entities that would like to use this.

Krásná: You have already mentioned some of the possibilities of applying it or using it in practice, where it would probably have the highest application. Maybe those who have expressed an interest have already responded to that?

Benešová: Well, yes, in the area of security, so the detection of that individual, or protection from misinformation, that is quite clear. But it’s also the HR, where of course it saves on manpower costs because it’s an automated thing, the job interview doesn’t even have to happen in real time. We can reach into the past and take real texts produced by that author from the past, it could be from some chats and social networks and things like that. So, of course, there too. And then definitely psychotherapy. Or if we talk about communication between, for example, a state and an individual, if you personalize that communication, if you address that individual in his own language, it definitely makes that communication easier, it’s easier for the party that’s communicating and for the party that’s being addressed.

Krásná: Clearly, this method can distinguish that the text was written by a human, not by an artificial intelligence.

Faltýnek: We can indeed do that, but here some other methods will probably be more successful and effective. Anyway, we know it is an artificial intelligence or a human precisely because it does not fall into those particular mannerisms. Artificial intelligences don’t have that, humans do, and that’s what makes the method here applicable in this area as well.

Benešová: So the characteristics that we are basing on simply do not appear in the text. If it is produced by a machine.

Krásná: I would like to ask you to comment on the graph that we have here, because the output of this research can actually be a graph. We have one particular one that we’re going to show, which was created from the text, as you mentioned, of Elliot Rodger, who killed six people in the United States in 2014, then shot himself. When we look at that graph, what can we read from it?

Benešová: Well, I’ve already prefaced it here, it’s a topic that he was troubled by, he was troubled by women and girls, and then there are also the places. It’s called wordcloud and there are the keywords that we talked about here and among those the keywords, to our surprise, the location of the future crime was repeated several times, plus the subject of the women that bothered him, plus the family, you see, father, mother and school.

Krásná: Martina Benešová, Dan Faltýnek. Thank you both very much for the interview. Congratulations on the success that you have had, and best wishes, I wish you more to come. Goodbye.

Benešová: Thank you very much, have a good day, goodbye.

Faltýnek: Goodbye.


We are a tech startup. We aim at mining an individual’s digital communication fingerprint to apply in the fields of state security, online psychotherapy, self-development, HR and marketing.

OLOMOUC 779 00
IČO 17378885


Copyright © 2023 Deepeffects.ai