Home | Trees | Indices | Help |
---|
|
Arabic module
Author: Taha Zerrouki
Contact: taha dot zerrouki at gmail dot com
Copyright: Arabtechies, Arabeyes, Taha Zerrouki
License: GPL
Date: 2010/03/01
Version: 0.1
|
|||
is letter functions | |||
---|---|---|---|
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
general letter functions | |||
integer; |
|
||
unicode; |
|
||
unicode; |
|
||
Has letter functions | |||
|
|||
word and text functions | |||
|
|||
|
|||
Boolean |
|
||
Boolean |
|
||
Boolean |
|
||
Char functions | |||
unicode char; |
|
||
unicode char; |
|
||
unicode char; |
|
||
unicode char; |
|
||
Strip functions | |||
unicode. |
|
||
unicode. |
|
||
unicode. |
|
||
unicode. |
|
||
unicode. |
|
||
|
|||
|
|||
|
|||
|
|||
|
|||
unicode. |
|
|
|||
COMMA =
|
|||
SEMICOLON =
|
|||
QUESTION =
|
|||
HAMZA =
|
|||
ALEF_MADDA =
|
|||
ALEF_HAMZA_ABOVE =
|
|||
WAW_HAMZA =
|
|||
ALEF_HAMZA_BELOW =
|
|||
YEH_HAMZA =
|
|||
ALEF =
|
|||
BEH =
|
|||
TEH_MARBUTA =
|
|||
TEH =
|
|||
THEH =
|
|||
JEEM =
|
|||
HAH =
|
|||
KHAH =
|
|||
DAL =
|
|||
THAL =
|
|||
REH =
|
|||
ZAIN =
|
|||
SEEN =
|
|||
SHEEN =
|
|||
SAD =
|
|||
DAD =
|
|||
TAH =
|
|||
ZAH =
|
|||
AIN =
|
|||
GHAIN =
|
|||
TATWEEL =
|
|||
FEH =
|
|||
QAF =
|
|||
KAF =
|
|||
LAM =
|
|||
MEEM =
|
|||
NOON =
|
|||
HEH =
|
|||
WAW =
|
|||
ALEF_MAKSURA =
|
|||
YEH =
|
|||
MADDA_ABOVE =
|
|||
HAMZA_ABOVE =
|
|||
HAMZA_BELOW =
|
|||
ZERO =
|
|||
ONE =
|
|||
TWO =
|
|||
THREE =
|
|||
FOUR =
|
|||
FIVE =
|
|||
SIX =
|
|||
SEVEN =
|
|||
EIGHT =
|
|||
NINE =
|
|||
PERCENT =
|
|||
DECIMAL =
|
|||
THOUSANDS =
|
|||
STAR =
|
|||
MINI_ALEF =
|
|||
ALEF_WASLA =
|
|||
FULL_STOP =
|
|||
BYTE_ORDER_MARK =
|
|||
FATHATAN =
|
|||
DAMMATAN =
|
|||
KASRATAN =
|
|||
FATHA =
|
|||
DAMMA =
|
|||
KASRA =
|
|||
SHADDA =
|
|||
SUKUN =
|
|||
SMALL_ALEF =
|
|||
SMALL_WAW =
|
|||
SMALL_YEH =
|
|||
LAM_ALEF =
|
|||
LAM_ALEF_HAMZA_ABOVE =
|
|||
LAM_ALEF_HAMZA_BELOW =
|
|||
LAM_ALEF_MADDA_ABOVE =
|
|||
simple_LAM_ALEF =
|
|||
simple_LAM_ALEF_HAMZA_ABOVE =
|
|||
simple_LAM_ALEF_HAMZA_BELOW =
|
|||
simple_LAM_ALEF_MADDA_ABOVE =
|
|||
LETTERS =
|
|||
TASHKEEL =
|
|||
HARAKAT =
|
|||
SHORTHARAKAT =
|
|||
TANWIN =
|
|||
LIGUATURES =
|
|||
HAMZAT =
|
|||
ALEFAT =
|
|||
WEAK =
|
|||
YEHLIKE =
|
|||
WAWLIKE =
|
|||
TEHLIKE =
|
|||
SMALL =
|
|||
MOON =
|
|||
SUN =
|
|||
AlphabeticOrder =
|
|||
NAMES =
|
|||
HARAKAT_pattern = re.compile(r'
|
|||
TASHKEEL_pattern = re.compile(r'
|
|||
HAMZAT_pattern = re.compile(r'
|
|||
ALEFAT_pattern = re.compile(r'
|
|||
LIGUATURES_pattern = re.compile(r'
|
|||
__package__ =
|
|
Checks for Arabic Sukun Mark.
|
Checks for Arabic Shadda Mark.
|
Checks for Arabic Tatweel letter modifier.
|
Checks for Arabic Tanwin Marks (FATHATAN, DAMMATAN, KASRATAN).
|
Checks for Arabic Tashkeel Marks (FATHA,DAMMA,KASRA, SUKUN, SHADDA, FATHATAN,DAMMATAN, KASRATAn).
|
Checks for Arabic Harakat Marks (FATHA,DAMMA,KASRA,SUKUN,TANWIN).
|
Checks for Arabic short Harakat Marks (FATHA,DAMMA,KASRA,SUKUN).
|
Checks for Arabic Ligatures like LamAlef. (LAM_ALEF, LAM_ALEF_HAMZA_ABOVE, LAM_ALEF_HAMZA_BELOW, LAM_ALEF_MADDA_ABOVE)
|
Checks for Arabic Hamza forms. HAMZAT are (HAMZA, WAW_HAMZA, YEH_HAMZA, HAMZA_ABOVE, HAMZA_BELOW,ALEF_HAMZA_BELOW, ALEF_HAMZA_ABOVE )
|
Checks for Arabic Alef forms. ALEFAT=(ALEF, ALEF_MADDA, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW,ALEF_WASLA, ALEF_MAKSURA );
|
Checks for Arabic Yeh forms. Yeh forms : YEH, YEH_HAMZA, SMALL_YEH, ALEF_MAKSURA
|
Checks for Arabic Waw like forms. Waw forms : WAW, WAW_HAMZA, SMALL_WAW
|
Checks for Arabic Teh forms. Teh forms : TEH, TEH_MARBUTA
|
Checks for Arabic Small letters. SMALL Letters : SMALL ALEF, SMALL WAW, SMALL YEH
|
Checks for Arabic Weak letters. Weak Letters : ALEF, WAW, YEH, ALEF_MAKSURA
|
Checks for Arabic Moon letters. Moon Letters :
|
Checks for Arabic Sun letters. Moon Letters :
|
return Arabic letter order between 1 and 29. Alef order is 1, Yeh is 28, Hamza is 29. Teh Marbuta has the same ordre with Teh, 3.
|
return Arabic letter name in arabic. Alef order is 1, Yeh is 28, Hamza is 29. Teh Marbuta has the same ordre with Teh, 3.
|
return a list of arabic characteres . Return a list of characteres between \u060c to \u0652
|
Checks if the arabic word contains shadda.
|
Checks if the arabic word is vocalized. the word musn't have any spaces and pounctuations.
|
Checks if the arabic text is vocalized. The text can contain many words and spaces
|
Checks for an Arabic standard Unicode block characters; An arabic string can contain spaces, digits and pounctuation. but only arabic standard characters, not extended arabic
|
Checks for an Arabic Unicode block characters;
|
Checks for an valid Arabic word. An Arabic word not contains spaces, digits and pounctuation avoid some spelling error, TEH_MARBUTA must be at the end.
|
Return the first char
|
Return the second char
|
Return the last letter example: zerrouki; 'i' is the last.
|
Return the second last letter example: zerrouki; 'k' is the second last.
|
Strip Harakat from arabic word except Shadda. The striped marks are :
Example: >>> text=u"الْعَرَبِيّةُ" >>> stripTashkeel(text) العربيّة
|
Strip vowels from a text, include Shadda. The striped marks are :
Example: >>> text=u"الْعَرَبِيّةُ" >>> stripTashkeel(text) العربية
|
Strip tatweel from a text and return a result text. Example: >>> text=u"العـــــربية" >>> stripTatweel(text) العربية
|
Normalize Lam Alef ligatures into two letters (LAM and ALEF), and Tand return a result text. Some systems present lamAlef ligature as a single letter, this function convert it into two letters, The converted letters into LAM and ALEF are :
Example: >>> text=u"لانها لالء الاسلام" >>> normalizeLigature(text) لانها لالئ الاسلام
|
Standardize the Hamzat into one form of hamza, replace Madda by hamza and alef. Replace the LamAlefs by simplified letters. Example: >>> text=u"سئل أحد الأئمة" >>> normalizeHamza(text) سءل ءحد الءءمة
|
separate the letters from the vowels, in arabic word, if a letter hasn't a haraka, the not definited haraka is attributed. return ( letters,vowels); |
if the two words has the same letters and the same harakats, this fuction return True. The two words can be full vocalized, or partial vocalized |
if the word1 is like a wazn (pattern), the letters must be equal, the wazn has FEH, AIN, LAM letters. this are as generic letters. The two words can be full vocalized, or partial vocalized |
if the two words has the same letters and the same harakats, this fuction return True. The first word is partially vocalized, the second is fully if the partially contians a shadda, it must be at the same place in the fully |
Reduce the Tashkeel, by deleting evident cases.
|
|
MOON
|
SUN
|
AlphabeticOrder
|
NAMES
|
HARAKAT_pattern
|
TASHKEEL_pattern
|
HAMZAT_pattern
|
ALEFAT_pattern
|
Home | Trees | Indices | Help |
---|
Generated by Epydoc 3.0.1 on Tue Mar 27 11:48:59 2012 | http://epydoc.sourceforge.net |