Skip to content
View stefan-it's full-sized avatar
🤓
hacking 🎧
🤓
hacking 🎧

Organizations

@flairNLP @Hugging-Face-Supporter @GermanT5 @Hugging-Face-Helping-Hand @LEL-A
Block or Report

Block or report stefan-it

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Raw data, scripts, etc. to produce the tables and figures of our technical report

5 1 Updated May 2, 2024

A byte-level decoder architecture that matches the performance of tokenized Transformers.

Jupyter Notebook 32 4 Updated Apr 24, 2024

Inspection tool for characterizing the semantic compositionality of subword tokenization in English

Python 3 Updated Apr 23, 2024

Evaluation of language models on mono- or multilingual tasks.

Python 60 11 Updated May 5, 2024

Investigating Gender Bias in Turkish Language Models

Jupyter Notebook 1 Updated Apr 30, 2024

Experiments for efforts to train a new and improved t5

Python 75 5 Updated Apr 15, 2024

Official implementation of "A Multi-level Framework for Accelerating Training Transformer Models""

Python 4 Updated Apr 15, 2024

Open weights language model from Google DeepMind, based on Griffin.

Python 513 19 Updated Apr 14, 2024

Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'

Python 442 30 Updated May 3, 2024

BEAR dataset

5 Updated Apr 8, 2024
Jupyter Notebook 1 Updated Apr 5, 2024

Temporary remove unused tokens during training to save ram and speed.

Python 20 2 Updated Apr 5, 2024

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Python 892 51 Updated May 1, 2024
20 1 Updated Mar 26, 2024

Minimal keyword extraction with BERT

Python 3,229 327 Updated Mar 21, 2024

Evaluation of the Fundus News Scraper https://github.com/flairNLP/fundus

Python 6 1 Updated Apr 2, 2024

Hetzner Online Community Project

Markdown 264 326 Updated May 3, 2024

ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages

4 1 Updated Feb 10, 2024

OCR, layout analysis, reading order, line detection in 90+ languages

Python 6,370 386 Updated May 5, 2024

The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".

Python 111 13 Updated May 22, 2023

Code for preprocessing data for UD annotations and for tagging/parsing experiments of MaiBaam

Python 1 Updated Mar 13, 2024

Data and code: "Answering legal questions from laymen in German civil law system", Büttner & Habernal, EACL'24

Python 6 2 Updated Mar 2, 2024

Grok open release

Python 48,186 8,165 Updated May 2, 2024

Language models scale reliably with over-training and on downstream tasks

Jupyter Notebook 81 3 Updated Apr 2, 2024

master thesis project @HU-Berlin

Jupyter Notebook 2 1 Updated Dec 21, 2023

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 24

Python 633 48 Updated Apr 25, 2024

Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"

Python 9 1 Updated Feb 14, 2024

Lightning fast data version control system for structured and unstructured machine learning datasets. We aim to make versioning datasets as easy as versioning code.

Python 836 12 Updated Apr 1, 2024
Next