• Login
    View Item 
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Information Technology
    • Undergraduate Theses
    • View Item
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Information Technology
    • Undergraduate Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Coreference Resolution untuk Teks Bahasa Indonesia Menggunakan Random Forest Classifier

    Coreference Resolution for Indonesian Text Using Random Forest Classifier

    Thumbnail
    View/Open
    Cover (902.8Kb)
    Fulltext (4.558Mb)
    Date
    2024
    Author
    Sari, Nia Ulan
    Advisor(s)
    Purnamawati, Sarah
    Rahmat, Romi Fadillah
    Metadata
    Show full item record
    Abstract
    Coreference Resolution is a subtask in Natural Language Processing (NLP) that focuses on identifying and solving the reference problem of two or more similar entities in text. In Indonesian texts, especially in novels, coreference resolution is crucial because of the complex language and rich variety of entities and references. Characters and entities in novels often interact, and references to characters may appear repeatedly. Another problem is that the presence of possessive pronouns which are widely used in novel texts in the form of affixes rather than complete words can cause confusion in determining references between entities. Therefore, coreference resolution research was carried out for Indonesian texts by detecting affix possessive pronouns using the Random Forest Classifier method. This research also utilizes Part-of-Speech Tagging (POS Tag) and Named Entity Recognition (NER) to maximize detection of person entities. By using 18 novel texts as training data and 10 novel texts as test data after the pre-processing stage, there are a total of 109306 entity and pronoun pairs in the training data, and 4938 pairs in the test data. This research uses RandomSearchCV to help the Random Forest Classifier algorithm find the best hyperparameters in the training process. By using the confusion matrix evaluation method, the metric values obtained from the test results of all test data are an accuracy of 85.5%, precision of 85%, recall of 82.2%, and f1-score of 83.6%.
    URI
    https://repositori.usu.ac.id/handle/123456789/96434
    Collections
    • Undergraduate Theses [767]

    Repositori Institusi Universitas Sumatera Utara (RI-USU)
    Universitas Sumatera Utara | Perpustakaan | Resource Guide | Katalog Perpustakaan
    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of USU-IRCommunities & CollectionsBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit DateThis CollectionBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit Date

    My Account

    LoginRegister

    Repositori Institusi Universitas Sumatera Utara (RI-USU)
    Universitas Sumatera Utara | Perpustakaan | Resource Guide | Katalog Perpustakaan
    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV