Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
The availability of machine-readable Arabic special domain text in digital libraries, websites of Arabic university publications, and refereed journals fosters numerous interesting studies and applications. Among these applications is automatic term extraction from special domain corpora. These extracted terms can serve as a foundation for other applications and research, such as special domain dictionary...
The interest has been increasing in recent years in extracting and analyzing evaluations and opinions of service or products from large bodies of text. It is important to classify predicates according to sense because whether or not a statement includes the speaker's opinion depends strongly on its predicate. It is generally assumed that Japanese part-of-speech (POS) for predicates is classified according...
This paper proposed a novel method to evaluate the performance of New Word Detection (NWD) based on repeats extraction. For small-scale corpus, we put forward employing Conditional Random Field (CRF) as statistical framework to estimate the effects of different strategies of NWD. For the situations of large-scale corpus, as there is no infinity of annotated corpus, comparative experiments are unable...
Named Entity Recognition (NER) is an important task in all Natural Language Processing (NLP) applications. It is the process of identifying and classifying the proper noun into classes such as person, location, organization and miscellaneous. Substantial work has been done in English and other European languages, achieving greater accuracy compared to the Indian Languages. Although NER in Indian languages...
This paper describes the development of an Indonesian NER system using online data such as Wikipedia 1 and DBPedia 2. The system is based on the Stanford NER system [8] and utilizes training documents constructed automatically from Wikipedia. Each entity, i.e. word or phrase that has a hyperlink, in the Wikipedia documents are tagged according to information that is obtained from DBPedia. In this...
This paper discuses concessive compact construction “wanyi···ye···” from Chinese Information Processing. The simple sentence with “Wanyi” and “Ye” and the concessive compact construction of “wanyi···ye···” are similar in syntax, which have been distinguished at first. The semantic feature of concessive compact construction “wanyi···ye···” is subjective, which has been betrayed in different ways.
Zongshi conjunction word, can be used in different syntax contexts. Zongshi usually directs a simple sentence which has been used as subordinate clause in compound sentence. Zongshi and other adverbs or conjunctions can express different logical relations in compound sentence. The original meaning of Zongshi is resuming or states a fact, which depends on the specific context. The sentence with Zongshi...
Emotional tendency refers to people's attitude towards people or things. It is a kind of subjective judgments and it can be divided into several parts, such as praise or criticize, positive or negative, good or bad. The judgment of emotional words' emotional tendency and the problem of how to give emotional words a weight are the base of text tendency analysis. The study of semantic weight has been...
Imperative sentences with assertive mood(ISAM), being positioned between typical declarative sentences and typical imperative sentences, appear as declarative sentences, but perform the function of imperative sentences. They are characterized by their verbs indicating action classification and the verbs are named “performative verbs”. The essay firstly explains why an imperative sentence with assertive...
Previous discussions on the translation of travel materials are mainly confined to functional and semeiotic perspectives. Authors of this paper hold that Xinjiang travel materials involve implicit information related to distinguished ethnical, geographical and historical cultures which cannot be absorbed comprehensively by English-speakers who do not share the same cultural backgrounds. They try to...
Many studies have explored on the usage of existing multilingual speech corpora to build an acoustic model for a target language. These works on multilingual acoustic modeling often use multilingual acoustic models to create an initial model. This initial model created is often suboptimal in decoding speech of the target language. Some speech of the target language is then used to adapt and improve...
In this paper, features extracted from modulation spectrogram are used to classify the phonemes in Gujarati language. Modulation spectrogram which is a 2-dimensional (i.e., 2-D) feature vector, is then reduced to a smaller feature dimension by using the proposed feature extraction method. Gujarati database was manually segmented in 31 phoneme classes. These phonemes are then classified using support...
Obstruents are the key landmark events found in the speech signal. In this paper, we propose use of spectral transition measure (STM) to locate the obstruents in the continuous speech. The proposed approach does not take in to account any prior information (like phonetic sequence, speech transcription, and number of obstruents in the speech). Hence this approach is unsupervised and unconstraint approach...
Vocal Tract Length Normalization (VTLN) is used to design vocal tract length normalized Automatic Speech Recognition (ASR) systems. It has led to improvement in the performance of ASR systems by taking into account the physiological differences among speakers. Recently, a number of speech recognition applications are being developed for Indian languages. In this paper, we use state-of-the-art method...
Singer IDentification (SID) is a very challenging problem in Music Information Retrieval (MIR) system. Instrumental accompaniments, quality of recording apparatus and other singing voices (in chorus) make SID very difficult and challenging research problem. In this paper, we propose SID system on large database of 500 Hindi (Bollywood) songs using state-of-the-art Mel Frequency Cepstral Coefficients...
Ba-construction is a special syntactic structure in modern Chinese. This paper gives a short summary on these topics and extracts 500 sentences including Ba-construction from CCRL. After detail analysis of the samples' phrase structure, the author builds the rules for computer based on CFG. These rules are tested by CTT — a parsing tree tracer. The author also points out the problems existed in the...
We describe our work on designing a linguistically principled part of speech (POS) tagset for the Indonesian language. The process involves a detailed study and analysis of existing tagsets and the manual tagging of an Indonesian corpus. The results of this work are an Indonesian POS tagset consisting of 23 tags and an Indonesian corpus of over 250.000 lexical tokens that have been manually tagged...
This paper describes work on a part-of-speech tagger for the Indonesian language by employing a rule-based approach. The system tokenizes documents while also considering multi-word expressions and recognizes named entities. It then applies tags to every token, starting from closed-class words to open-class words and disambiguates the tags based on a set of manually defined rules. The system currently...
In this paper, by using the Tibetan word segmentation system, IEA-TWordSeg, the authors attempt segmentation of the total 1271 sentences in the closed set and 1000 sentences in an open set. The accuracy of testing is 99.54% and 92.41% respectively. The authors describe the wrong segmentation types as well as the causes of the mistakes, and demonstrate the proportion of different types of segmentation...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.