Adjusting BERT’s Pooling Layer for Large-Scale Multi-Label Text Classification

dc.contributor.authorLehečka, Jan
dc.contributor.authorŠvec, Jan
dc.contributor.authorIrcing, Pavel
dc.contributor.authorŠmídl, Luboš
dc.date.accessioned2021-02-22T11:00:20Z
dc.date.available2021-02-22T11:00:20Z
dc.date.issued2020
dc.description.abstract-translatedIn this paper, we present our experiments with BERT models in the task of Large-scale Multi-label Text Classification (LMTC). In the LMTC task, each text document can have multiple class labels, while the total number of classes is in the order of thousands. We propose a pooling layer architecture on top of BERT models, which improves the quality of classification by using information from the standard [CLS] token in combination with pooled sequence output. We demonstrate the improvements on Wikipedia datasets in three different languages using public pre-trained BERT models.en
dc.format8 s.cs
dc.format.mimetypeapplication/pdf
dc.identifier.citationLEHEČKA, J., ŠVEC, J., IRCING, P., ŠMÍDL, L. Adjusting BERT’s Pooling Layer for Large-Scale Multi-Label Text Classification. In: Text, Speech, and Dialogue 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8-11, 2020, Proceedings. Cham: Springer, 2020. s. 214-221. ISBN 978-3-030-58322-4, ISSN 0302-9743.cs
dc.identifier.doi10.1007/978-3-030-58323-1_23
dc.identifier.isbn978-3-030-58322-4
dc.identifier.issn0302-9743
dc.identifier.obd43930358
dc.identifier.uri2-s2.0-85091136861
dc.identifier.urihttp://hdl.handle.net/11025/42716
dc.language.isoenen
dc.project.IDDG18P02OVV016/Vývoj centralizovaného rozhraní pro vytěžování velkých dat z webových archivůcs
dc.publisherSpringeren
dc.relation.ispartofseriesText, Speech, and Dialogue 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8-11, 2020, Proceedingsen
dc.rightsPlný text není přístupný.cs
dc.rights© Springeren
dc.rights.accessclosedAccessen
dc.subject.translatedText classification, BERT modelen
dc.titleAdjusting BERT’s Pooling Layer for Large-Scale Multi-Label Text Classificationen
dc.typekonferenční příspěvekcs
dc.typeconferenceObjecten
dc.type.statusPeer-revieweden
dc.type.versionpublishedVersionen

Files