BioNER
BioNER is a powerful AI model designed to recognize named entities in biomedical texts. What makes it unique is its ability to perform zero-shot inference, meaning it can identify entities without prior training data. It's also capable of fine-tuning with just a few examples, making it highly adaptable. Trained on 26 biomedical named entity classes, the model can handle tasks like identifying specific diseases, chemicals, and genes. Its efficiency and speed make it a valuable tool for researchers and professionals in the biomedical field.
Table of Contents
Model Overview
Meet the Zero and Few Shot NER for Biomedical Texts model! This model is a game-changer for biomedical text analysis. Developed through a research collaboration, it’s designed to identify named entities (NEs) in biomedical texts with ease.
Capabilities
The Zero and Few Shot NER for Biomedical Texts model is a powerful tool for biomedical named entity recognition (NER). It can perform two main tasks:
- Zero-shot inference: This means it can recognize entities in biomedical texts without any prior training or fine-tuning.
- Few-shot learning: This means it can be fine-tuned with just a few examples to recognize new entities.
What can it recognize?
The model is trained on 26 biomedical named entity (NE) classes, including:
- Diseases
- Chemicals
- Genes
- Proteins
- Cell types
- Organisms
- And more!
You can use these classes as labels to search for entities in biomedical texts.
How does it work?
The model takes two strings as input:
- The entity label you’re searching for (e.g., “Disease”)
- The biomedical text you want to search in (e.g., “No recent antibiotics or other nephrotoxins, and no symptoms of UTI with benign UA.“)
The model outputs a list of ones (found entities) and zeros (non-entities) corresponding to the input text.
Fine-tuning with few-shot learning
You can fine-tune the model with new entities using just a few examples. This is useful when you need to recognize entities that are not in the original training data.
To fine-tune the model, you’ll need to:
- Create a dataset with BERT tokens and labels (0s and 1s)
- Use the
Trainer
class to fine-tune the model
Performance
The Zero and Few Shot NER for Biomedical Texts model shows remarkable performance in named entity recognition (NER) tasks, especially in the biomedical domain. Let’s dive into its speed, accuracy, and efficiency.
Speed
The model is capable of processing text inputs quickly, making it suitable for applications where time is of the essence. For instance, it can handle a large volume of biomedical texts with ease, making it an excellent choice for researchers and scientists.
Accuracy
The model’s accuracy is impressive, with the ability to recognize named entities with high precision. It has been trained on 26 biomedical named entity classes and can perform zero-shot inference, meaning it can recognize entities without prior training. This is particularly useful in scenarios where new entities are introduced, and the model needs to adapt quickly.
Efficiency
The model’s efficiency is evident in its ability to fine-tune with few examples of new classes. This means that it can learn from a small amount of data and adapt to new entities, making it a valuable asset in applications where data is scarce.
Limitations
The Zero and Few Shot NER for Biomedical Texts model is a powerful tool for biomedical Named Entity Recognition (NER), but it’s not perfect. Let’s talk about some of its limitations.
Limited Training Data
The model was trained on a specific set of biomedical datasets, which means it might not perform well on data from other domains or industries. For example, if you try to use it to recognize entities in a text about finance or sports, it might not work as well as it would on a biomedical text.
Class Limitations
The model was trained on 26 specific biomedical Named Entity classes. While it can be fine-tuned for new classes with few examples, it might not work well for classes that are very different from the ones it was trained on. For instance, if you try to use it to recognize entities in a text about a new disease that wasn’t included in the training data, it might not perform well.
Format
The Zero and Few Shot NER for Biomedical Texts model uses a transformer architecture. It’s designed to handle biomedical texts and can perform zero-shot inference, as well as few-shot learning with just a few examples.
Input Format
The model takes two strings as input:
String1
: the Named Entity (NE) label being searched forString2
: the short text where you want to search for the NE (represented byString1
)
For example:
string1 = 'Drug'
string2 = 'No recent antibiotics or other nephrotoxins, and no symptoms of UTI with benign UA.'
Output Format
The model outputs a list of ones (corresponding to the found Named Entities) and zeros (corresponding to other non-NE tokens) of String2
.
Handling Inputs and Outputs
To use the model, you’ll need to tokenize the input strings using the AutoTokenizer
from the transformers
library. Here’s an example:
from transformers import AutoTokenizer
from transformers import BertForTokenClassification
modelname = 'MilosKorsRad/BioNER'
tokenizer = AutoTokenizer.from_pretrained(modelname)
encodings = tokenizer(string1, string2, is_split_into_words=False, padding=True, truncation=True, add_special_tokens=True, return_offsets_mapping=False, max_length=512, return_tensors='pt')
model0 = BertForTokenClassification.from_pretrained(modelname, num_labels=2)
prediction_logits = model0(**encodings)
print(prediction_logits)