Fake news are "news" items known by the author to be false and published with an intent to deceive. Researchers in linguistics have shown that language (i.e., linguistic features) of fake news is quite important to explore and may be the key to its detection.
By analysing the linguistic features of fake news items, the project team will enable detection and flagging of deliberate disinformation, excluding, for example, (inadvertent) misinformation, satirical texts and opinion texts. Hence, the project will take societal safety and security into consideration while at the same time guarding the freedom of speech.
The Fakespeak project involves a core team of linguists and computer scientists where the linguists will seek to reveal the grammatical and stylistic features of the language of fake news (referred to as Fakespeak), in Russian, Norwegian and English and computer scientists will develop tools to automatically detect the linguistic features in the news items. The project involves a core team of linguists and computer scientists based in Norway and the UK. The project is led by the University of Oslo, the Department of Literature, Area Studies and European Languages.
To achieve this goal the team will first build and make use of existing corpora of fake and real news from various online media outlets in all three languages, and then subject the datasets to thorough linguistic analyses. They will apply methods and draw on insights from corpus linguistics, computational linguistics, applied linguistics, including forensic linguistics, as well as pragmatics and rhetoric. Considering the linguists´ findings, the computer scientists will seek to improve these systems by applying and developing machine learning models, algorithms, knowledge graphs, sometimes in combination.
SINTEF will actively contribute to develop new corpora alongside keenly focus on developing robust fact checking tools by the incorporation of the inputs from the linguists in the project team.