• Login
    • Login
    Advanced Search
    View Item 
    •   Maseno IR Home
    • Journal Articles
    • School of Computing and informatics
    • Department of Information Technology
    • View Item
    •   Maseno IR Home
    • Journal Articles
    • School of Computing and informatics
    • Department of Information Technology
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    PolitiKweli: A Swahili-English Code-switched Twitter Political Misinformation Classification Dataset

    Thumbnail
    View/Open
    14_politikweli_a_swahili_english_.pdf (401.2Kb)
    Publication Date
    2023-08-30
    Author
    Amol, Cynthia Jayne
    Awuor, Lilian Diana Wanzare
    Metadata
    Show full item record
    Abstract/Overview
    In the age of freedom of speech, users of the social media platform Twitter post millions of messages per day. These messages are not always fact-checked resulting in misinformation which is false or misleading news. Misinformation classification involves identifying and classifying text as either false or fact by comparing the text against fact-checked news. On political matters, misinformation online can result in mistrust of political figures, polarization of communities and violence offline. Existing studies mostly address misinformation detection for messages written in a single language such as English. Among most bilingual or multilingual user groups in countries like Kenya, the use of Swahili-English code-switching and code-mixing is a common practice in informal text-based communication such as messaging on social media platforms like Twitter. There is therefore need for more research in low-resource languages such as Swahili. The PolitiKweli dataset introduced by this study, which a novel Swahili-English misinformation classification dataset, contains 6,345 Swahili-English texts, 22,957 English texts and 211 Swahili texts. The texts are labelled as fake, fact or neutral as compared to a fact-checked dataset also created for this study. The dataset curation process including data collection, processing and annotation are explained. Challenges during annotation are also discussed. The result of experiments conducted using a pretrained language model prove the dataset’s usefulness in training Swahili-English code-switched misinformation classification models.
    Permalink
    https://repository.maseno.ac.ke/handle/123456789/6046
    Collections
    • Department of Information Technology [13]

    Maseno University. All rights reserved | Copyright © 2022 
    Contact Us | Send Feedback

     

     

    Browse

    All of Maseno IRCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    Maseno University. All rights reserved | Copyright © 2022 
    Contact Us | Send Feedback