Warning: Table './science/watchdog' is marked as crashed and should be repaired
query: INSERT INTO watchdog
(uid, type, message, variables, severity, link, location, referer, hostname, timestamp)
VALUES
(0, 'php', '%message in %file on line %line.', 'a:4:{s:6:\"%error\";s:7:\"warning\";s:8:\"%message\";s:138:\"realpath(): open_basedir restriction in effect. File(/tmp) is not within the allowed path(s): (/var/www/html/science/science.uni-obuda.hu)\";s:5:\"%file\";s:60:\"/var/www/html/science/science.uni-obuda.hu/includes/file.inc\";s:5:\"%line\";i:200;}', 3, '', 'http://science.uni-obuda.hu/en/node/1624', '', '216.73.216.244', 1749293441) in /var/www/html/science/science.uni-obuda.hu/includes/database.mysql.inc on line 135
Warning: Table './science/watchdog' is marked as crashed and should be repaired
query: INSERT INTO watchdog
(uid, type, message, variables, severity, link, location, referer, hostname, timestamp)
VALUES
(0, 'php', '%message in %file on line %line.', 'a:4:{s:6:\"%error\";s:7:\"warning\";s:8:\"%message\";s:138:\"realpath(): open_basedir restriction in effect. File(/tmp) is not within the allowed path(s): (/var/www/html/science/science.uni-obuda.hu)\";s:5:\"%file\";s:60:\"/var/www/html/science/science.uni-obuda.hu/includes/file.inc\";s:5:\"%line\";i:200;}', 3, '', 'http://science.uni-obuda.hu/en/node/1624', '', '216.73.216.244', 1749293441) in /var/www/html/science/science.uni-obuda.hu/includes/database.mysql.inc on line 135
Language Identification Using Global Statistics of Natural Languages | science.uni-obuda.hu
2nd Romanian-Hungarian Joint Symposium on Applied Compuational Intelligence
Volume
Proceedings
Conference Location
Timisoara
Abstract
This article is about a new method which makes it possible to identify the
language of a written document. The method is based on the analysis of simple descriptive
statistics of the given text. These simple statistical features include things like average word
length or consonant congestion.
In order to measure the effectiveness of the method an application has been developed
which can classify English, Hungarian, German, Spanish, Croatian, French and
Norwegian documents by analysing the average word length, the ratio of certain
characters, word endings and consonant congestion.