pycrossword  0.4
Pure-Python implementation of a crossword puzzle generator and editor
Public Member Functions | Public Attributes | List of all members
pycross.dbapi.HunspellImport Class Reference

Public Member Functions

def __init__ (self, settings, dbmanager=None, dicfolder=DICFOLDER)
 
def pool_running (self)
 
def pool_threadcount (self)
 
def pool_wait (self)
 
def get_installed_info (self, lang)
 
def list_hunspell (self, stopcheck=None)
 Retrieves the list of Hunspell dictionaries available for download from the public Github repo. More...
 
def list_all_dics (self, stopcheck=None)
 
def download_hunspell (self, url, lang, overwrite=True, on_stopcheck=None, on_start=None, on_getfilesize=None, on_progress=None, on_complete=None, on_error=None, wait=False)
 
def download_hunspell_all (self, dics, on_stopcheck=None, on_start=None, on_getfilesize=None, on_progress=None, on_complete=None, on_error=None)
 
def standard_posrules (self, lang)
 Returns the default Hunspell-formatted metadata patterns for the three common parts of speech (noun, verb, adjective). More...
 
def standard_replacements (self, lang)
 Returns the default replacement rules for a language to use in Hunspell imports. More...
 
def add_from_hunspell (self, lang, posrules=None, posrules_strict=False, posdelim='/', lcase=True, replacements=None, remove_hyphens=True, filter_out=None, rows=None, commit_each=1000, on_checkstop=None, on_start=None, on_word=None, on_commit=None, on_finish=None, on_error=None, wait=False)
 Imports a Hunspell-formatted dictionary file into the DB. More...
 
def add_all_from_hunspell (self, dics, posrules=None, posrules_strict=True, posdelim='/', lcase=True, replacements=None, remove_hyphens=True, filter_out=None, rows=None, commit_each=1000, on_stopcheck=None, on_start=None, on_word=None, on_commit=None, on_finish=None, on_error=None)
 Imports all Hunspell-formatted dictionaries found in 'assets/dic'. More...
 

Public Attributes

 settings
 
 db
 
 dicfolder
 
 pool
 
 timeout_
 
 proxies_
 

Constructor & Destructor Documentation

◆ __init__()

def pycross.dbapi.HunspellImport.__init__ (   self,
  settings,
  dbmanager = None,
  dicfolder = DICFOLDER 
)

Member Function Documentation

◆ add_all_from_hunspell()

def pycross.dbapi.HunspellImport.add_all_from_hunspell (   self,
  dics,
  posrules = None,
  posrules_strict = True,
  posdelim = '/',
  lcase = True,
  replacements = None,
  remove_hyphens = True,
  filter_out = None,
  rows = None,
  commit_each = 1000,
  on_stopcheck = None,
  on_start = None,
  on_word = None,
  on_commit = None,
  on_finish = None,
  on_error = None 
)

Imports all Hunspell-formatted dictionaries found in 'assets/dic'.

Warning
All imported dictionary files must have the '.dic' extension.
Parameters
dicsiterable list of dict containing language info (others found will be skipped). Default = None (import all found dictionaries)
See also
add_from_hunspell()

◆ add_from_hunspell()

def pycross.dbapi.HunspellImport.add_from_hunspell (   self,
  lang,
  posrules = None,
  posrules_strict = False,
  posdelim = '/',
  lcase = True,
  replacements = None,
  remove_hyphens = True,
  filter_out = None,
  rows = None,
  commit_each = 1000,
  on_checkstop = None,
  on_start = None,
  on_word = None,
  on_commit = None,
  on_finish = None,
  on_error = None,
  wait = False 
)

Imports a Hunspell-formatted dictionary file into the DB.

Hunspell dictionaries can be downloaded from LibreOffice or Github. Default dictionaries and prebuilt SQLite databases are found in assets/dic.

Parameters
langstr short name of the imported dictionary language, e.g. 'en', 'de' etc.
Warning
The file must be in plain text format, with each word on a new line, optionally followed by a slash (see 'posdelim' argument) and meta-data (parts of speech etc.)
Parameters
posrulesdict part-of-speech regular expression parsing rules in the format:
{'N': 'regex for nouns', 'V': 'regex for verb', ...}
     Possible keys are: 'N' [noun], 'V' [verb], 'ADV' [adverb], 'ADJ' [adjective],
     'P' [participle], 'PRON' [pronoun], 'I' [interjection],
     'C' [conjuction], 'PREP' [preposition], 'PROP' [proposition],
     'MISC' [miscellaneous / other], 'NONE' [no POS]
 
posrules_strictbool if True (default), only the parts of speech present in posrules dict will be imported [all other words will be skipped]. If False, such words will be imported with 'MISC' and 'NONE' POS markers.
posdelimstr delimiter delimiting the word and its part of speech [default = '/']
lcasebool if True (default), found words will be imported in lower case; otherwise, the original case will remain
replacementsdict: character replacement rules in the format:
{'char_from': 'char_to', ...}
Default = None (no replacements)
remove_hyphensbool if True (default), all hyphens ['-'] will be removed from the words
filter_outdict regex-based rules to filter out [exclude] words in the format:
{'word': ['regex1', 'regex2', ...], 'pos': ['regex1', 'regex2', ...]}
These words will not be imported. One of the POS rules can be used to screen off specific parts of speech. Match rules for words will be applied AFTER replacements and in the sequential order of the regex list. Default = None (no filter rules apply).
commit_eachint threshold of insert operations after which the transaction will be committed (default = 1000)
on_wordcallable callback function to be called when a word is imported into the DB. Callback prototype is:
on_word(lang: str, dicfile: str, word: str, part_of_speech: str, records_committed: int) -> None
on_commitcallable callback function to be called when a next portion of records is written to the DB. Callback prototype is:
on_commit(lang: str, dicfile: str, records_committed: int) -> None
Returns
int number of words imported from the dictionary
See also
add_all_from_hunspell()
Parameters
on_errorcallable callback function to be called when an exception occurs Callback prototype is:
on_error(lang: str, dicfile: str, error_message: str) -> None

◆ download_hunspell()

def pycross.dbapi.HunspellImport.download_hunspell (   self,
  url,
  lang,
  overwrite = True,
  on_stopcheck = None,
  on_start = None,
  on_getfilesize = None,
  on_progress = None,
  on_complete = None,
  on_error = None,
  wait = False 
)

◆ download_hunspell_all()

def pycross.dbapi.HunspellImport.download_hunspell_all (   self,
  dics,
  on_stopcheck = None,
  on_start = None,
  on_getfilesize = None,
  on_progress = None,
  on_complete = None,
  on_error = None 
)

◆ get_installed_info()

def pycross.dbapi.HunspellImport.get_installed_info (   self,
  lang 
)

◆ list_all_dics()

def pycross.dbapi.HunspellImport.list_all_dics (   self,
  stopcheck = None 
)

◆ list_hunspell()

def pycross.dbapi.HunspellImport.list_hunspell (   self,
  stopcheck = None 
)

Retrieves the list of Hunspell dictionaries available for download from the public Github repo.

◆ pool_running()

def pycross.dbapi.HunspellImport.pool_running (   self)

◆ pool_threadcount()

def pycross.dbapi.HunspellImport.pool_threadcount (   self)

◆ pool_wait()

def pycross.dbapi.HunspellImport.pool_wait (   self)

◆ standard_posrules()

def pycross.dbapi.HunspellImport.standard_posrules (   self,
  lang 
)

Returns the default Hunspell-formatted metadata patterns for the three common parts of speech (noun, verb, adjective).

The returned patterns depend on the language.

Parameters
langstr language for which the matching patterns are requested, e.g. 'en' or 'ru'
Returns
dict POS to regex pattern matching table in the format:
{'N': 'regex pattern for nouns', 'V': 'regex pattern for verbs', 'ADJ': 'regex pattern for adjectives'}
If the language is invalid (none of 'en', 'ru', 'fr' or 'de'), None is returned.
Reimplement this method as needed to support other languages / parts of speech formats.
See also
add_from_hunspell()

◆ standard_replacements()

def pycross.dbapi.HunspellImport.standard_replacements (   self,
  lang 
)

Returns the default replacement rules for a language to use in Hunspell imports.

Parameters
langstr language for which the matching patterns are requested, e.g. 'en' or 'ru'
Returns
dict default replacements in the format:
{'character to replace': 'replacement character'}
If the language is invalid (currently only 'ru' or 'fr'), None is returned.
Reimplement this method as needed to add other languages / replaced characters.
See also
add_from_hunspell()

Member Data Documentation

◆ db

pycross.dbapi.HunspellImport.db

◆ dicfolder

pycross.dbapi.HunspellImport.dicfolder

◆ pool

pycross.dbapi.HunspellImport.pool

◆ proxies_

pycross.dbapi.HunspellImport.proxies_

◆ settings

pycross.dbapi.HunspellImport.settings

◆ timeout_

pycross.dbapi.HunspellImport.timeout_

The documentation for this class was generated from the following file: