Methods for Skill Extraction from Resumes and Job Postings
Methods for Skill Extraction from Resumes and Job Postings
Automatic skill extraction is a key task in recruitment systems, job recommendation, and labor market analysis. The input consists of unstructured text: the "Requirements" section of a job posting or the "Experience/Skills" block of a resume. The output is expected to be a normalized list of competencies, suitable for searching, comparison, and analytics.
This article discusses the pipeline implemented in iskillmatching, which combines three complementary approaches:
- NER based on LLM — neural network named entity recognition.
- Pattern matching via spaCy — searching using a predefined skill dictionary.
- Normalization via vector representations — converting extracted variants to canonical forms using semantic similarity.
1. NER based on LLM (Neural Network Named Entity Recognition)
What is NER
Named Entity Recognition (NER) is a sequence classification task where each token in a text is assigned a label: whether it is part of a named entity (e.g., "technology," "skill," "organization") or not. Traditionally, NER was solved using CRFs and rules, but modern transformer-based LLMs (Large Language Models) achieve significantly higher quality due to their contextual understanding of text.
Model Used
In ner_utils.py, the HuggingFace Transformers pipeline is used:
from transformers import pipeline
def get_ner_extractor(model_name="dondosss/rubert-finetuned-ner"):
return pipeline(
"token-classification",
model=model_name,
aggregation_strategy="simple"
)