Cloud computing architecture for Tagging Arabic Text Using Hybrid Model

With the increasing role of technology in transferring information in our daily lives, the Arabic language has become the fourth language used on the Internet. Therefore, to develop different information systems in the Arabic language, we should determine the syntax and semantics of creating a text efficiently and accurately. Part of speech (POS) is one of the primary methods employed to develop any language corpus. Each language consists of several tags applied in different applications, such as natural language processing (NLP), speech synthesis, and information extraction. One of the main benefits of adopting cloud computing services is the offer a low cost and time to store your company data compared to traditional methods. This paper presents and deploys a cloud computing architecture for Tagging Arabic text using a hybrid model, which will help reduce the efforts and cost. The results show an excellent accuracy rate in tagging an Arabic text and quickly respond. Previous studies are compared based on relevant rating factors, which achieved high accuracy, procession, and recall rate of more than 95%. The cloud computing tagger attained an accuracy of 99.2%.


Introduction
The Arabic language is growing interest from other languages in the NLP community because it is the official language of over 400 million native speakers (Oueslati et al., 2020). According to Statista the share of internet users records of the common languages used on the internet on January 2020 is shown in Figure 1 (Statista, 2020). The Arabic language is the Fourth language used on internet. With the increasing role of technology in our daily lives, the Arabic language faces significant challenges and threats. This is represented in the different dialects and other languages and their impact on the Arabic language. The Arabic language is the only language to communicate with the Arab world (Maulud et al., 2021). Hence, all companies, organizations, and people in business resort to using it while directing a specific speech to the Arab world or advertisement of a particular commodity or product. It is also the only language that allows researchers and scholars interested in studying the Arab world to understand its history, culture, and civilization. Today, the Arabic language faces challenges, some of which spread widely in Arab societies and negatively affect them, as it reduces its presence in the linguistic reality circulating among its people (Alyafeai et al., 2021). Perhaps the most important of these challenges is the great spread that dialects or slang have become known to in the lives of societies. And in the media, in a big way, which threatens the mother tongue and makes it in a lower degree and a second level.

Languages used on the internet % globally
Part of Speech (POS) tagging is the rule of appointing each word in a sentence to a unique part-of-speech tag such as noun, verb, adverb, pronoun, preposition, etc. It indicates the syntactic category of the word based on context to solve lexical ambiguity. Each language consists of several tags applied in different applications, such as natural language processing, speech synthesis, and information extraction. POS is one of the primary methods employed to develop any language corpus. Many approaches are used to build the POS taggers, such as Rule-Based, Statistical-Based, and Neural Networks as shown in Figure 2. In the rule-based method, the knowledge rules are produced based on the linguists to define precisely how to assign the various POS tags. Several statistical taggers were built for tagging the POS tags, such as Hidden Markov Models (HHM) (). Also, many studies were utilized Neural networks for POS tagging successfully (Yousif J., 2013). Cloud computing is rapidly developing to serve several areas, such as economic, social life, and scientific applications. ). In addition, it helps to keep their online business easy and efficient. One of the main benefits of adopting cloud computing is the offer a low cost and time to store your company data compared to traditional ways. This paper presents and deploys a cloud computing architecture for Tagging Arabic text using a hybrid model, which will help reduce the efforts and cost.

Arabic Language and POS
The Arabic language is growing interest from other languages in the NLP community because it is the official language of over 400 million native speakers. The Arabic Language is consisting of 28 letters. The direction of writing text in Arabic language is started from right margin and proceed to left margin. The presence of diacritics authorizes disambiguation (Hifny Y., 2021) since several Arabic words in text will have similar constituent letters but completely different meanings, such as ‫ذهب"‬ Dahab". It could pronounce as (noun gold, verb went).
Arabic uses diacritics to elucidate words, which has four diacritics for giving short sounds. It includes the following: -fatHa (character placed on the highest of a letter to show (a sound like in apple).
-Dhamma (character placed on the highest of a letter to display (u as in rudimentary).
-kasra is put out a letter to present (i sound (as in intake).
-Sukun is a tiny circle placed on the highest of a letter to show that the attached consonant is not followed by a vowel.
Part of Speech (POS) tagging is the rule of appointing each word in a sentence to a unique part-of-speech tag such as noun, verb, adverb, pronoun, preposition, etc. The Arabic text is categorized into three main tags (parts of speech): nouns indicate attributes, circumstances, and verbs indicate actions and particles that adhere to verbs and nouns. There are two types of names that either describe the male or describe the feminine. There are three primary categories of Arabic Part of Speech as presented in Table 2.

Proposed Cloud POS tagger
The Cloud computing concept is anything that involves delivering hosted services over the Internet. These services can be divided into three classes Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS). The proposed POS tagger is deployed as Software-as-a-Service (SaaS), which is can be used in online manner from anywhere. Figure 3 presents the proposed POS tagger with its relationship connections. The proposed system for applying parts of speech online can be used from anywhere, saving effort and time inefficiently and quickly identifying parts of speech. The user can start from the private cloud (Iaas) and connect to the public cloud (Paas) through the Internet provider (Caas) to use the part-of-speech recognition system. And then can get the results efficiently and reliably, which helps to complete applications according to the user's request.

Results and Discussion
This section discusses the proposed POS tagger based on using cloud computing techniques. For the sake of testing the proposed tagger, the online Arabic linguistic tool is implemented (APL, (2020). Some text in the Arabic language was extracted from the online source (Arabic_Text, 2021). A preprocessing phase was deployed manually to format the text style, as shown in Figure 4. The result of POS tagging is presented in Table 3. The results are exciting in that it recognized both the POS tagging and named entities. It is good to mention that all numbers should be written as symbols, not in words. For example, the word ‫)الخامس(‬ in line 55, can be a place or a number. This will lead to some ambiguity in discovering the exact types of speech. Therefore, a self-processing or manual process must convert any written number to a numerical digit. .

‫تنقيط‬ ‫عالمة‬
Comparison with previous studies shows no common Copus for the Arabic language or a unified group for parts of speech. The researchers published different sets of parts of speech, ranging from 25 to 177 tag sets. Besides, they used diverse methods to identify the Arabic text, which was evaluated using various evaluation factors. Unfortunately, we cannot make a fair comparison unless parts of speech tags, the number of training data, and the method used are unified. However, Table 4 presents the results of the comparison based on the use of the most appropriate rating factors.
The presented studies in Table 4 achieved high accuracy, procession, and recall rate of more than 95%. Figure 5.  Accuracy rate 150 and at any time, which will help to distinguish the parts of speech with high accuracy and easy access. It helps to develop new applications efficiently.
The review of previous studies showed an urgent need to create aggregates of Arabic texts arranged in an easy way to access and manipulate. It also likes to unify the number and types of parts of speech to enable researchers to build efficient and fast extraction systems. Finally, it is necessary to establish research groups concerned with processing the Arabic language processing. It will discover and provide mathematical models that can prevent ambiguity in finding the meaning of words and using them in the appropriate location.