FAQ Database Discussion Community


Get index of first non standard english character

c#,linq,character-encoding,globalization,diacritics
I'm trying to process a string and separate it into two parts when i find a character that is not of the standard english alphabet. For example This is a stríng with áccents. and i need to know the index of the first or every character with accent (í). I...

Is there any way to force ipython to interpret utf-8 symbols?

string,utf-8,ipython,literals,diacritics
I'm using ipython notebook. What I want to do is search a literal string for any spanish accented letters (ñ,á,é,í,ó,ú,Ñ,Á,É,Í,Ó,Ú) and change them to their closest representation in the english alphabet. I decided to write down a simple function and give it a go: def remove_accent(n): listn = list(n) for...

unicode character for a with e above it

unicode,diacritics,ligature
The letters ä, ö, ü in German were written (e.g., in Gutenberg's Bible) with the respective vowels that had a tiny e printed right above them. Are these characters available in Unicode? They looked something like: e e e e e e A O U a o u If they...

Eliminate accents in python

python,unicode,diacritics
I have this function to remove accents in a word def remove_accents(word): return ''.join(x for x in unicodedata.normalize('NFKD', word) if x in string.ascii_letters) But when I run it it shows an error UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position 3: ordinal not in range(128) The character in position...

Unresponsive Script (JS-jQuery Autocomplete)

jquery,text,autocomplete,highlight,diacritics
I'm working with jQuery Autocomplete and I'm kinda stuck on this one particular issue. My autocomplete source list isn't a simple arraylist, It stores database fields. I've replicated the functionality in this JSFiddle. To elaborate, in autocomplete source, I have 3 fields : item ID A simple label, which contains...

Rename ä, ö, ü to ae, oe, ue

java,diacritics
We want to rename strings that way that "strange" characters like German umlauts are translated to their official non-umlaut representation. In Java, is there some function to convert such characters (AKA handle the mapping), not only for the German umlauts, but also for French, Czech or Scandinavian characters? The reason...

Why does this code to replace accented chars with html codes fail to work?

c#,html,winforms,diacritics,accented-strings
I want to replace accented chars (such as á, ñ, ¿, ¡, etc.) with the corresponding HTML codes (such as á, ñ, ¿, ¡, etc.). For example, this line of text: Imposible me ha sido rehusarme á las repetidas instancias que el Caballero Trelawney, el Doctor Livesey y otros muchos...

PHP – Why does some umlaut characters show while others don't?

php,character-encoding,output,special-characters,diacritics
This is a simplified version of the code I'm currently working with – why does it work fine when using a basic echo, but fails when trying to write only a part of the string? Should I add any encode/decode/locale/utf to make this work, in that case how? <meta charset="utf-8">...

Removing diacritical marks from a Greek text in an automatic way

bash,diacritics,transliteration
I have a decompiled stardict dictionary in the form of a tab file κακός <tab> bad where <tab> signifies a tabulation. Unfortunately, the way the words are defined requires the query to include all diacritical marks. So if I want to search for ζῷον, I need to have all the...

How to remove diacritics (accents) from a string?

string,dart,diacritics,unaccent
I'm trying to convert some strings that are in Czech, Spanish, French etc. I'd like to take out the accent marks in the letters while keeping the letter. (E.g. convert é to e, č to c, Ž to Z, ñ to n) What is the best way to achieve this?...

Why don't some diacritics get stripped?

.net,string,diacritics
I am using the method from this answer to remove special characters from words and change them to a simple form. This works pretty nicely for many basic accents, e.g. Malmö becomes "Malmo" München becomes "Munchen" Åge becomes "Age" However this doesn't work on some other characters, for example: Strømsgodset...