Train Word embeddings from scratch with Nessvec and PyTorch
Hobson and his colleagues try to figure out how to train word embeddings from scratch using the WikiText2 dataset in PyTorch. The WikiText2 dataset contains redacted words, but they were unable to find the "labels" that reveal the words masked with the symbol ``. If you try to use the `Wik...
Autor Corporativo: | |
---|---|
Otros Autores: | |
Formato: | Video |
Idioma: | Inglés |
Publicado: |
[Place of publication not identified] :
Manning Publications
[2022]
|
Edición: | [First edition] |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009825927106719 |
Sumario: | Hobson and his colleagues try to figure out how to train word embeddings from scratch using the WikiText2 dataset in PyTorch. The WikiText2 dataset contains redacted words, but they were unable to find the "labels" that reveal the words masked with the symbol ``. If you try to use the `Wikipedia` package to retrieve Wikipedia pages directly, you may hit the `suggest` bug. There are more than 100 unanswered issues on the project, and the maintainer has pushed any changes for many years. The Tangible AI fork on GitLab fixes this search suggestion bug so we could easily crawl Wikipedia. Unfortunately, the Wikipedia-API package is not very useful for searching and crawling Wikipedia to retrieve text. |
---|---|
Descripción Física: | 1 online resource (1 video file (41 min.)) : sound, color |