Methods for Skill Extraction from Resumes and Job Postings

Automatic skill extraction is a key task in recruitment systems, job recommendation, and labor market analysis. The input consists of unstructured text: the "Requirements" section of a job posting or the "Experience/Skills" block of a resume. The output is expected to be a normalized list of competencies, suitable for searching, comparison, and analytics.

This article discusses the pipeline implemented in iskillmatching, which combines three complementary approaches:

NER based on LLM — neural network named entity recognition.
Pattern matching via spaCy — searching using a predefined skill dictionary.
Normalization via vector representations — converting extracted variants to canonical forms using semantic similarity.

1. NER based on LLM (Neural Network Named Entity Recognition)

What is NER

Named Entity Recognition (NER) is a sequence classification task where each token in a text is assigned a label: whether it is part of a named entity (e.g., "technology," "skill," "organization") or not. Traditionally, NER was solved using CRFs and rules, but modern transformer-based LLMs (Large Language Models) achieve significantly higher quality due to their contextual understanding of text.

Model Used

In ner_utils.py, the HuggingFace Transformers pipeline is used:

from transformers import pipeline

def get_ner_extractor(model_name="dondosss/rubert-finetuned-ner"):
    return pipeline(
        "token-classification",
        model=model_name,
        aggregation_strategy="simple"
    )