podium.preproc.lemmatizer package

Submodules

podium.preproc.lemmatizer.croatian_lemmatizer module

Module Croatian Lemmatizer loads already prepared lemmatizer dictionaries. It can return all possible word inflections for a lemma, or return the lemma of any word inflexion for the Croatian language.

class podium.preproc.lemmatizer.croatian_lemmatizer.CroatianLemmatizer(**kwargs)

Bases: object

Class for lemmatizing words and fetching word inflections for a given lemma

BASE_FOLDER

folder to download lemmatizer resources

Type

str

MOLEX14_LEMMA2WORD

dictionary file path containing lemma to words mappings

Type

str

MOLEX14_WORD2LEMMA

dictionary file path containing word to lemma mappings

Type

str

get_words_for_lemma(lemma)

Returns a list of words that shares the provided lemma.

Parameters

word (str) – Word lemma to find words that share this lemma

Returns

List of words that share the lemma provided uppercased at same chars as lemma provided

Return type

list(str)

Raises

ValueError – If no words for the provided lemma are found.

podium.preproc.lemmatizer.croatian_lemmatizer.get_croatian_lemmatizer_hook(**kwargs)

Method obtains croatian lemmatizer hook.

Parameters

kwargs (dict) – Croatian lemmatizer arguments.

Module contents

Package contains modules for lemmatizing.

class podium.preproc.lemmatizer.CroatianLemmatizer(**kwargs)

Bases: object

Class for lemmatizing words and fetching word inflections for a given lemma

BASE_FOLDER

folder to download lemmatizer resources

Type

str

MOLEX14_LEMMA2WORD

dictionary file path containing lemma to words mappings

Type

str

MOLEX14_WORD2LEMMA

dictionary file path containing word to lemma mappings

Type

str

get_words_for_lemma(lemma)

Returns a list of words that shares the provided lemma.

Parameters

word (str) – Word lemma to find words that share this lemma

Returns

List of words that share the lemma provided uppercased at same chars as lemma provided

Return type

list(str)

Raises

ValueError – If no words for the provided lemma are found.