Finta Fň - Konverze z utf-8 do ascii
Finta Fň - Konverze z utf-8 do ascii
Uveřejněné návody byly složité a kostrbaté. Žádný se mi nepodařilo uspokojivě oživit. Nakonec jsem vyprodukoval vlastní jednoduché řešení, které vám tímto nabízím. Funguje pouze za předpokladu, že znáte jazyk (a tedy možnou odpovídající znakovou sadu), ve které je text ke konverzi.
V mém případě jsem konvertoval z češtiny / slovenštiny, pro kteréžto jazyky lze bezpečně použít znakovou sadu windows-1250 jako mezistupeň. Pro jiné jazyky nutno použít jinou příslušnou znakovou sadu.
<?php
function utf2ascii($string) {
$string=iconv('utf-8','windows-1250',$string);
$win ="ěščřžýáíéťňďúůóöüäĚŠČŘŽÝÁÍÉŤŇĎÚŮÓÖÜËÄ\x97\x96\x91\x92\x84\x93\x94\xAB\xBB";
$ascii="escrzyaietnduuoouelloaESCRZYAIETNDUUOOUEA\x2D\x2D\x27\x27\x22\x22\x22\x22\x22";
$string = StrTr($string,$win,$ascii);
return $string;
}
?>
Tuto proceduru uložte jako samostatný soubor - a pozor - musí být uložena ve stejné znakové sadě, kterou používáte jako mezistupeň. Jinak to nebude fungovat.
Re: cstocs
celé vláknoRe: cstocs
celé vláknoLze jednoduše nainstalovat z cpanu nebo třeba apt-get install cstocs
Re: cstocs
celé vláknoNelze, pokud hostujete a nemáte moc přemluvit admina, aby to tam docpal. Mnou navrhované řešení nevyžaduje "zásah z hůry". Kromě iconv(), ale to bývá dnes ve většině instalací k dispozici.
Re: cstocs
celé vláknoAle vážně... ty perl věci jdou samozřejmě nainstalovat i pro lokálního uživatele, takže by to šlo taktéž, ale jinak chápu.
Re: cstocs
celé vláknocp A.pm /tam/kde/admin/nepozera/lib/
==============
#/usr/bin/perl
use lib qw(/tam/kde/admin/nepozera/lib/)
use A;
...
iconv
celé vláknoRe: iconv
celé vláknoRe: iconv
celé vláknoiconv -f UTF8 -t ASCII//TRANSLIT -o soubor2.txt soubor.txt
Ale jinak je ten původní příspěvek úplná ptákovina. Co budete dělat, až vám tam někdo napíše nějaké jiné znaky s diakritikou — ą, ñ, Ł, …? OK, může se to hodit do domácího prográmku, ale v jakémkoli „profesionálním“ systému, který tvrdí, že přijímá Unicode (UTF-8), to nemá co dělat.
Re: iconv
celé vláknoVe svém příspěvku jsem jasně zdůraznil, že konverze je omezena na případy, kdy víme, že překládáme pouze z češtiny a slovenštiny. Nejde o univerzální řešení a ani nemůže jít. Z té myriady znakových sad v unicode je jen malá část zobrazitelná latinkou.
Re: iconv
celé vláknorecode utf8..flat soubor
Hello world? a skoro každý Čech na to zabudne ;-) a ešte links
celé vláknono a na slovensku máme aj ľ
alebo sa nebodaj skrýva v tých \xAB?
no a na konverziu do ascii by sa mal dať použiť aj links
zobraziť si to ako ascii stránku a uložiť "formated output"
Re: Hello world? a skoro každý Čech na to zabudne ;-) a ešte links
celé vláknoNerozumím
celé vlákno1) aby pojmulo jazyky s více než 256 znaky
2) aby v jednom dokumentu mohly být znaky více jazyků
Jak chce autor v ASCII napsat do jednoho dokumentu kus rusky, kus česky a kus třeba korejsky?
I recode to umí
celé vláknoRe: I recode to umí
celé vláknoPrrecode: Dvojznačný výstup in step `UTF-8..ISO-8859-1'
Re: I recode to umí
celé vláknoA pokud si to chcete udelat sami a poradne
celé vláknohttp://www.unicode.org/Public/UNIDATA/UnicodeData.txt
Tam v sestem sloupci je uvedeno, z jakych jinych znaku se sklada, napriklad:
010C;LATIN CAPITAL LETTER C WITH CARON;Lu;0;L;0043 030C;;;;N;LATIN CAPITAL LETTER C HACEK;;;010D;
male č se sklada z c (0043) a hacku (030c)
Re: A pokud si to chcete udelat sami a poradne
celé vláknoRe: A pokud si to chcete udelat sami a poradne
celé vláknoLigatury, znamenka, uvozovky a podobne by to samozrejme take chtelo nahradit, ale to uz se dostavame trochu nekam jinam (a hlavne, v UNICODE uz tato informace neni).
konwert
celé vláknokonwert utf8-ascii <vstup >vystup
ma aj plno dalsich filtrov, napr. konverzia z/do ΤεΧu, dobre moznosti transliteracie azbuky, moznost pouzitia ako filtra pre terminal, a jednoduchu moznost dopisania si vlastnych konverznych tabuliek
No dobře dobře
celé vláknoSlibuji, že už to víckrát neudělám.
Chtěl jsem pomoci někomu, kdo třeba měl podobný problém. Nedělám si za to zálusk na Nobelovu cenu ani nečekám nějaký zvláštní vděk. Ale jsem překvapen negativním laděním většiny vaších komentářů, z nichž ty mírnější naznačují, jaký jsme vůl, že neznám jejich řešení.
Re: No dobře dobře
celé vláknoJen tam máte chybku, doprostřed řetězce $ascii se nějak dostalo "ello", což tam nepatří.
klak
recode
celé vláknoinfo recode. Hodně štěstí (tímto díky L. Šafářovi ze SUSE - Novellu, který mě jej kdysi naučil).
Re: recode
celé vláknoA co uzit multi-byte funkce?
celé vlákno
function toAscii($text)
{
mb_internal_encoding("UTF-8");
$win = "ěščřžýáíéťňďúůóĚŠČŘŽÝÁÍÉŤŇĎÚŮÓ";
$ascii = "escrzyaietnduuoESCRZYAIETNDUUO";
for ($i=0; $i<mb_strlen($ascii); $i++)
{
$text = mb_ereg_replace(mb_substr($win, $i, 1), mb_substr($ascii, $i, 1), $text);
}
return $text;
}
dokonale
celé vláknofunction utf2ascii($inputStr) {
$FT_UTF8TOASCIISRC = array("\xc3\x80","\xc3\x81","\xc3\x82","\xc3\x83","\xc3\x84","\xc3\x85","\xc3\x87","\xc3\x88","\xc3\x89","\xc3\x8a","\xc3\x8b","\xc3\x8c","\xc3\x8d","\xc3\x8e","\xc3\x8f","\xc3\x91","\xc3\x92","\xc3\x93","\xc3\x94","\xc3\x95","\xc3\x96","\xc3\x99","\xc3\x9a","\xc3\x9b","\xc3\x9c","\xc3\x9d","\xc3\xa0","\xc3\xa1","\xc3\xa2","\xc3\xa3","\xc3\xa4","\xc3\xa5","\xc3\xa7","\xc3\xa8","\xc3\xa9","\xc3\xaa","\xc3\xab","\xc3\xac","\xc3\xad","\xc3\xae","\xc3\xaf","\xc3\xb1","\xc3\xb2","\xc3\xb3","\xc3\xb4","\xc3\xb5","\xc3\xb6","\xc3\xb9","\xc3\xba","\xc3\xbb","\xc3\xbc","\xc3\xbd","\xc3\xbf","\xc4\x80","\xc4\x81","\xc4\x82","\xc4\x83","\xc4\x84","\xc4\x85","\xc4\x86","\xc4\x87","\xc4\x88","\xc4\x89","\xc4\x8a","\xc4\x8b","\xc4\x8c","\xc4\x8d","\xc4\x8e","\xc4\x8f","\xc4\x92","\xc4\x93","\xc4\x94","\xc4\x95","\xc4\x96","\xc4\x97","\xc4\x98","\xc4\x99","\xc4\x9a","\xc4\x9b","\xc4\x9c","\xc4\x9d","\xc4\x9e","\xc4\x9f","\xc4\xa0","\xc4\xa1","\xc4\xa2","\xc4\xa3","\xc4\xa4","\xc4\xa5","\xc4\xa8","\xc4\xa9","\xc4\xaa","\xc4\xab","\xc4\xac","\xc4\xad","\xc4\xae","\xc4\xaf","\xc4\xb0","\xc4\xb4","\xc4\xb5","\xc4\xb6","\xc4\xb7","\xc4\xb9","\xc4\xba","\xc4\xbb","\xc4\xbc","\xc4\xbd","\xc4\xbe","\xc5\x83","\xc5\x84","\xc5\x85","\xc5\x86","\xc5\x87","\xc5\x88","\xc5\x8c","\xc5\x8d","\xc5\x8e","\xc5\x8f","\xc5\x90","\xc5\x91","\xc5\x94","\xc5\x95","\xc5\x96","\xc5\x97","\xc5\x98","\xc5\x99","\xc5\x9a","\xc5\x9b","\xc5\x9c","\xc5\x9d","\xc5\x9e","\xc5\x9f","\xc5\xa0","\xc5\xa1","\xc5\xa2","\xc5\xa3","\xc5\xa4","\xc5\xa5","\xc5\xa8","\xc5\xa9","\xc5\xaa","\xc5\xab","\xc5\xac","\xc5\xad","\xc5\xae","\xc5\xaf","\xc5\xb0","\xc5\xb1","\xc5\xb2","\xc5\xb3","\xc5\xb4","\xc5\xb5","\xc5\xb6","\xc5\xb7","\xc5\xb8","\xc5\xb9","\xc5\xba","\xc5\xbb","\xc5\xbc","\xc5\xbd","\xc5\xbe","\xc6\xa0","\xc6\xa1","\xc6\xaf","\xc6\xb0","\xc7\x8d","\xc7\x8e","\xc7\x8f","\xc7\x90","\xc7\x91","\xc7\x92","\xc7\x93","\xc7\x94","\xc7\x95","\xc7\x96","\xc7\x97","\xc7\x98","\xc7\x99","\xc7\x9a","\xc7\x9b","\xc7\x9c","\xc7\x9e","\xc7\x9f","\xc7\xa2","\xc7\xa3","\xc7\xa6","\xc7\xa7","\xc7\xa8","\xc7\xa9","\xc7\xaa","\xc7\xab","\xc7\xb0","\xc7\xb4","\xc7\xb5","\xc7\xb8","\xc7\xb9","\xc7\xba","\xc7\xbb","\xc7\xbc","\xc7\xbd","\xc7\xbe","\xc7\xbf","\xc8\x80","\xc8\x81","\xc8\x82","\xc8\x83","\xc8\x84","\xc8\x85","\xc8\x86","\xc8\x87","\xc8\x88","\xc8\x89","\xc8\x8a","\xc8\x8b","\xc8\x8c","\xc8\x8d","\xc8\x8e","\xc8\x8f","\xc8\x90","\xc8\x91","\xc8\x92","\xc8\x93","\xc8\x94","\xc8\x95","\xc8\x96","\xc8\x97","\xc8\x98","\xc8\x99","\xc8\x9a","\xc8\x9b","\xc8\x9e","\xc8\x9f","\xc8\xa6","\xc8\xa7","\xc8\xa8","\xc8\xa9","\xc8\xaa","\xc8\xab","\xc8\xac","\xc8\xad","\xc8\xae","\xc8\xaf","\xc8\xb2","\xc8\xb3","\xcd\xbe","\xce\x85","\xce\x87","\xe1\xb8\x80","\xe1\xb8\x81","\xe1\xb8\x82","\xe1\xb8\x83","\xe1\xb8\x84","\xe1\xb8\x85","\xe1\xb8\x86","\xe1\xb8\x87","\xe1\xb8\x88","\xe1\xb8\x89","\xe1\xb8\x8a","\xe1\xb8\x8b","\xe1\xb8\x8c","\xe1\xb8\x8d","\xe1\xb8\x8e","\xe1\xb8\x8f","\xe1\xb8\x90","\xe1\xb8\x91","\xe1\xb8\x92","\xe1\xb8\x93","\xe1\xb8\x98","\xe1\xb8\x99","\xe1\xb8\x9a","\xe1\xb8\x9b","\xe1\xb8\x9e","\xe1\xb8\x9f","\xe1\xb8\xa0","\xe1\xb8\xa1","\xe1\xb8\xa2","\xe1\xb8\xa3","\xe1\xb8\xa4","\xe1\xb8\xa5","\xe1\xb8\xa6","\xe1\xb8\xa7","\xe1\xb8\xa8","\xe1\xb8\xa9","\xe1\xb8\xaa","\xe1\xb8\xab","\xe1\xb8\xac","\xe1\xb8\xad","\xe1\xb8\xae","\xe1\xb8\xaf","\xe1\xb8\xb0","\xe1\xb8\xb1","\xe1\xb8\xb2","\xe1\xb8\xb3","\xe1\xb8\xb4","\xe1\xb8\xb5","\xe1\xb8\xb6","\xe1\xb8\xb7","\xe1\xb8\xba","\xe1\xb8\xbb","\xe1\xb8\xbc","\xe1\xb8\xbd","\xe1\xb8\xbe","\xe1\xb8\xbf","\xe1\xb9\x80","\xe1\xb9\x81","\xe1\xb9\x82","\xe1\xb9\x83","\xe1\xb9\x84","\xe1\xb9\x85","\xe1\xb9\x86","\xe1\xb9\x87","\xe1\xb9\x88","\xe1\xb9\x89","\xe1\xb9\x8a","\xe1\xb9\x8b","\xe1\xb9\x8c","\xe1\xb9\x8d","\xe1\xb9\x8e","\xe1\xb9\x8f","\xe1\xb9\x94","\xe1\xb9\x95","\xe1\xb9\x96","\xe1\xb9\x97","\xe1\xb9\x98","\xe1\xb9\x99","\xe1\xb9\x9a","\xe1\xb9\x9b","\xe1\xb9\x9e","\xe1\xb9\x9f","\xe1\xb9\xa0","\xe1\xb9\xa1","\xe1\xb9\xa2","\xe1\xb9\xa3","\xe1\xb9\xaa","\xe1\xb9\xab","\xe1\xb9\xac","\xe1\xb9\xad","\xe1\xb9\xae","\xe1\xb9\xaf","\xe1\xb9\xb0","\xe1\xb9\xb1","\xe1\xb9\xb2","\xe1\xb9\xb3","\xe1\xb9\xb4","\xe1\xb9\xb5","\xe1\xb9\xb6","\xe1\xb9\xb7","\xe1\xb9\xbc","\xe1\xb9\xbd","\xe1\xb9\xbe","\xe1\xb9\xbf","\xe1\xba\x80","\xe1\xba\x81","\xe1\xba\x82","\xe1\xba\x83","\xe1\xba\x84","\xe1\xba\x85","\xe1\xba\x86","\xe1\xba\x87","\xe1\xba\x88","\xe1\xba\x89","\xe1\xba\x8a","\xe1\xba\x8b","\xe1\xba\x8c","\xe1\xba\x8d","\xe1\xba\x8e","\xe1\xba\x8f","\xe1\xba\x90","\xe1\xba\x91","\xe1\xba\x92","\xe1\xba\x93","\xe1\xba\x94","\xe1\xba\x95","\xe1\xba\x96","\xe1\xba\x97","\xe1\xba\x98","\xe1\xba\x99","\xe1\xba\xa0","\xe1\xba\xa1","\xe1\xba\xa2","\xe1\xba\xa3","\xe1\xba\xa4","\xe1\xba\xa5","\xe1\xba\xa6","\xe1\xba\xa7","\xe1\xba\xa8","\xe1\xba\xa9","\xe1\xba\xaa","\xe1\xba\xab","\xe1\xba\xb8","\xe1\xba\xb9","\xe1\xba\xba","\xe1\xba\xbb","\xe1\xba\xbc","\xe1\xba\xbd","\xe1\xba\xbe","\xe1\xba\xbf","\xe1\xbb\x80","\xe1\xbb\x81","\xe1\xbb\x82","\xe1\xbb\x83","\xe1\xbb\x84","\xe1\xbb\x85","\xe1\xbb\x88","\xe1\xbb\x89","\xe1\xbb\x8a","\xe1\xbb\x8b","\xe1\xbb\x8c","\xe1\xbb\x8d","\xe1\xbb\x8e","\xe1\xbb\x8f","\xe1\xbb\x90","\xe1\xbb\x91","\xe1\xbb\x92","\xe1\xbb\x93","\xe1\xbb\x94","\xe1\xbb\x95","\xe1\xbb\x96","\xe1\xbb\x97","\xe1\xbb\xa4","\xe1\xbb\xa5","\xe1\xbb\xa6","\xe1\xbb\xa7","\xe1\xbb\xb2","\xe1\xbb\xb3","\xe1\xbb\xb4","\xe1\xbb\xb5","\xe1\xbb\xb6","\xe1\xbb\xb7","\xe1\xbb\xb8","\xe1\xbb\xb9","\xe1\xbf\x81","\xe1\xbf\xad","\xe1\xbf\xaf","\xe1\xbf\xbd","\xe2\x84\xaa","\xe2\x84\xab","\xe2\x89\xa0","\xe2\x89\xae","\xe2\x89\xaf");
$FT_UTF8TOASCIIDST = array("\x41","\x41","\x41","\x41","\x41","\x41","\x43","\x45","\x45","\x45","\x45","\x49","\x49","\x49","\x49","\x4e","\x4f","\x4f","\x4f","\x4f","\x4f","\x55","\x55","\x55","\x55","\x59","\x61","\x61","\x61","\x61","\x61","\x61","\x63","\x65","\x65","\x65","\x65","\x69","\x69","\x69","\x69","\x6e","\x6f","\x6f","\x6f","\x6f","\x6f","\x75","\x75","\x75","\x75","\x79","\x79","\x41","\x61","\x41","\x61","\x41","\x61","\x43","\x63","\x43","\x63","\x43","\x63","\x43","\x63","\x44","\x64","\x45","\x65","\x45","\x65","\x45","\x65","\x45","\x65","\x45","\x65","\x47","\x67","\x47","\x67","\x47","\x67","\x47","\x67","\x48","\x68","\x49","\x69","\x49","\x69","\x49","\x69","\x49","\x69","\x49","\x4a","\x6a","\x4b","\x6b","\x4c","\x6c","\x4c","\x6c","\x4c","\x6c","\x4e","\x6e","\x4e","\x6e","\x4e","\x6e","\x4f","\x6f","\x4f","\x6f","\x4f","\x6f","\x52","\x72","\x52","\x72","\x52","\x72","\x53","\x73","\x53","\x73","\x53","\x73","\x53","\x73","\x54","\x74","\x54","\x74","\x55","\x75","\x55","\x75","\x55","\x75","\x55","\x75","\x55","\x75","\x55","\x75","\x57","\x77","\x59","\x79","\x59","\x5a","\x7a","\x5a","\x7a","\x5a","\x7a","\x4f","\x6f","\x55","\x75","\x41","\x61","\x49","\x69","\x4f","\x6f","\x55","\x75","\xdc","\xfc","\xdc","\xfc","\xdc","\xfc","\xdc","\xfc","\xc4","\xe4","\xc6","\xe6","\x47","\x67","\x4b","\x6b","\x4f","\x6f","\x6a","\x47","\x67","\x4e","\x6e","\xc5","\xe5","\xc6","\xe6","\xd8","\xf8","\x41","\x61","\x41","\x61","\x45","\x65","\x45","\x65","\x49","\x69","\x49","\x69","\x4f","\x6f","\x4f","\x6f","\x52","\x72","\x52","\x72","\x55","\x75","\x55","\x75","\x53","\x73","\x54","\x74","\x48","\x68","\x41","\x61","\x45","\x65","\xd6","\xf6","\xd5","\xf5","\x4f","\x6f","\x59","\x79","\x3b","\xa8","\xb7","\x41","\x61","\x42","\x62","\x42","\x62","\x42","\x62","\xc7","\xe7","\x44","\x64","\x44","\x64","\x44","\x64","\x44","\x64","\x44","\x64","\x45","\x65","\x45","\x65","\x46","\x66","\x47","\x67","\x48","\x68","\x48","\x68","\x48","\x68","\x48","\x68","\x48","\x68","\x49","\x69","\xcf","\xef","\x4b","\x6b","\x4b","\x6b","\x4b","\x6b","\x4c","\x6c","\x4c","\x6c","\x4c","\x6c","\x4d","\x6d","\x4d","\x6d","\x4d","\x6d","\x4e","\x6e","\x4e","\x6e","\x4e","\x6e","\x4e","\x6e","\xd5","\xf5","\xd5","\xf5","\x50","\x70","\x50","\x70","\x52","\x72","\x52","\x72","\x52","\x72","\x53","\x73","\x53","\x73","\x54","\x74","\x54","\x74","\x54","\x74","\x54","\x74","\x55","\x75","\x55","\x75","\x55","\x75","\x56","\x76","\x56","\x76","\x57","\x77","\x57","\x77","\x57","\x77","\x57","\x77","\x57","\x77","\x58","\x78","\x58","\x78","\x59","\x79","\x5a","\x7a","\x5a","\x7a","\x5a","\x7a","\x68","\x74","\x77","\x79","\x41","\x61","\x41","\x61","\xc2","\xe2","\xc2","\xe2","\xc2","\xe2","\xc2","\xe2","\x45","\x65","\x45","\x65","\x45","\x65","\xca","\xea","\xca","\xea","\xca","\xea","\xca","\xea","\x49","\x69","\x49","\x69","\x4f","\x6f","\x4f","\x6f","\xd4","\xf4","\xd4","\xf4","\xd4","\xf4","\xd4","\xf4","\x55","\x75","\x55","\x75","\x59","\x79","\x59","\x79","\x59","\x79","\x59","\x79","\xa8","\xa8","\x60","\xb4","\x4b","\xc5","\x3d","\x3c","\x3e");
return str_replace($FT_UTF8TOASCIISRC, $FT_UTF8TOASCIIDST, $inputStr);
}
Re: Desktop
celé vláknoutf8-->ascii
celé vláknoŠkolení: Linux – Firewall, Samba, VPN
Na třídenním školení se naučíte nainstalovat a spravovat Firewall a Router, SAMBA Doménový a Souborový server. Dále si zprovozníte vlastní, zabezpečený VPN server.
Podrobnější informace a přihláška

