Pre-trained Language Models for the Legal Domain: A Case Study on Indian Law
Clicks: 13
ID: 283289
2022
NLP in the legal domain has seen increasing success with the emergence of
Transformer-based Pre-trained Language Models (PLMs) pre-trained on legal text.
PLMs trained over European and US legal text are available publicly; however,
legal text from other domains (countries), such as India, have a lot of
distinguishing characteristics. With the rapidly increasing volume of Legal NLP
applications in various countries, it has become necessary to pre-train such
LMs over legal text of other countries as well. In this work, we attempt to
investigate pre-training in the Indian legal domain. We re-train (continue
pre-training) two popular legal PLMs, LegalBERT and CaseLawBERT, on Indian
legal data, as well as train a model from scratch with a vocabulary based on
Indian legal text. We apply these PLMs over three benchmark legal NLP tasks --
Legal Statute Identification from facts, Semantic Segmentation of Court
Judgment Documents, and Court Appeal Judgment Prediction -- over both Indian
and non-Indian (EU, UK) datasets. We observe that our approach not only
enhances performance on the new domain (Indian texts) but also over the
original domain (European and UK texts). We also conduct explainability
experiments for a qualitative comparison of all these different PLMs.
Reference Key |
ghosh2022pretrained
Use this key to autocite in the manuscript while using
SciMatic Manuscript Manager or Thesis Manager
|
---|---|
Authors | Shounak Paul; Arpan Mandal; Pawan Goyal; Saptarshi Ghosh |
Journal | arXiv |
Year | 2022 |
DOI | DOI not found |
URL | |
Keywords |
Citations
No citations found. To add a citation, contact the admin at info@scimatic.org
Comments
No comments yet. Be the first to comment on this article.