• Login
    View Item 
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Information Technology
    • Undergraduate Theses
    • View Item
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Information Technology
    • Undergraduate Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Ekstraksi Data Alamat Indonesia dengan Named Entity Recognition Menggunakan Teknik Bilstm + Crf

    View/Open
    Fulltext (949.5Kb)
    Date
    2021
    Author
    Hakim, Arif Rahman
    Advisor(s)
    Gunawan, Dani
    Seniman
    Metadata
    Show full item record
    Abstract
    The variety of addressing formats in Indonesia is due to the long history that has been passed, ethnicity, ethnicity and the vast territory of Indonesia. For addresses, before being processed and stored in the data warehouse, a process is needed to extract the information contained. Address data can be in the form of numbers, street names, sub-districts, sub-districts, provincial regencies to postcodes. Some countries have templates to equalize address writing, but Indonesia does not yet have standardization. So that the topic of Indonesian address data extraction is unique and has a different level of difficulty compared to other countries. Where it is possible that the format of writing the address can be different depending on the region. This study aims to be able to extract information from Indonesian address data, so that the extraction results can be used for further other purposes. In this study, the extraction of information on Indonesian address data was carried out using Named Entity Recognition (NER) with the biLSTM and CRF techniques. Extraction considers patterns, relationships between words and is influenced by prepositions and words behind them. The results of the evaluation showed that the NER method worked well in extracting information with an F1-Score of 0.9086.
     
    Beragamnya format pengalamatan di Indonesia dikarenakan sejarah panjang yang telah dilalui, etnik, suku serta luas wilayah Indonesia. Untuk alamat, sebelum diolah dan disimpan pada data warehouse diperlukan suatu proses untuk mengekstrak informasi yang terkandung. Data alamat dapat berupa nomor, nama jalan, kelurahan, kecamatan, kabupaten provinsi hingga kodepos. Beberapa negara memiliki templating untuk menyetarakan penulisan alamat, namun Indonesia belum memiliki standarisasi. Sehingga topik ekstraksi data alamat Indonesia menjadi unik dan memiliki tingkat kesulitan berbeda dibandingkan negara lain. Dimana bisa saja format penulisan alamat dapat berbeda tergantung daerahnya. Penelitian ini bertujuan untuk dapat mengekstraksi informasi dari data alamat Indonesia, sehingga hasil ekstraksi dapat digunakan untuk tujuan lain lebih lanjut. Pada penelitian ini, ekstraksi informasi pada data alamat Indonesia dilakukan menggunakan Named Entity Recognition (NER) dengan teknik biLSTM dan CRF. Ekstraksi mempertimbangkan pola, hubungan antar kata serta dipengaruhi kata depan dan kata di belakangnya. Hasil evaluasi penelitian menunjukkan bahwa metode NER bekerja dengan baik dalam mengekstraksi informasi dengan nilai F1-Score sebesar 0.9086.

    URI
    https://repositori.usu.ac.id/handle/123456789/47670
    Collections
    • Undergraduate Theses [796]

    Repositori Institusi Universitas Sumatera Utara (RI-USU)
    Universitas Sumatera Utara | Perpustakaan | Resource Guide | Katalog Perpustakaan
    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of USU-IRCommunities & CollectionsBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit DateThis CollectionBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit Date

    My Account

    LoginRegister

    Repositori Institusi Universitas Sumatera Utara (RI-USU)
    Universitas Sumatera Utara | Perpustakaan | Resource Guide | Katalog Perpustakaan
    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV