FAQ Database Discussion Community


avoidbreak is not correctly working in pdflib 8

unicode,encoding,character-encoding,pdflib
I have a text, the word "Versicherungs-Lebenslagen" should only break at the hyphen. This works, but the world Lebenslagen should stay in the same line, if there is enough space. In my picture, there word is in the next line. My pdflib code which i read in the API DOC,...

Can nginx re-encode XML documents, or alter XML headers?

xml,nginx,utf-8,character-encoding,iso-8859-1
I have a problem ultimately caused by a third party XML document whose actual encoding (ISO 8859-1 or Windows 1252, can't tell) doesn't match its declared encoding (UTF-8). I'm looking for creative workarounds. We already use nginx proxies for various content, so perhaps there is a way to either: Re-encode...

XMLPullParser black diamond question marks with certain characters

android,xml,character-encoding,xmlpullparser,questionmark
I'm making an android app, that needs to fetch and parse XML. The class for that was made following the instructions from here http://www.tutorialspoint.com/android/android_rss_reader.htm and the fetcher method looks like this: public void fetchXML() { Thread thread = new Thread(new Runnable() { @Override public void run() { try { URL...

utf-8 characters get lost when converting from list to data.frame in R

r,utf-8,character-encoding,data.frame,locale
I am using R 3.2.0 with RStudio 0.98.1103 on Windows 7 64-bit. The Windows "regional and language settings" of my computer is English (United States). For some reason the following code replaces my Czech characters "č" and "ř" by "c" and "r" in the text "Koryčany nad přehradou", when I...

ImageMagick escape character?

windows,command-line,character-encoding,imagemagick,imagemagick-convert
Is there an escape character I can use to pick up a copyright symbol in an ImageMagick convert command label directive? I'm trying to mark a batch of images with a credit/copyright in the bottom-right corner. The batch is being handled by CMD script running in a Windows command line...

character encoding issue with the BCP and ó

sql-server,unicode,character-encoding,bcp
I have a file that needs to go to Poland. In my data I have an ó. When I output the data to a table the ó is still there as the field is NVARCHAR. When I cut and paste from the table into Excel or Notepad++ the ó stays...

Is sending UTF-8 encoded characters Network Safe?

java,networking,encoding,utf-8,character-encoding
The reason for encoding with standard Base64 format is to make sure it won't contain any control characters which may be considered as control characters over network. This ensures receiving same data over the other side of the network transfer. In this scenario, Does UTF-8 character encoding provides same as...

How to make Python Interactive Shell print cyrillic symbols?

python,shell,unicode,character-encoding,cyrillic
I'm using Pymorphy2 in my project as a cyrillic morphological analyzer. But when I try to print out the list of words, I get this: >>> for t in terms: ... p = morph.parse(t) ... if 'VERB' in p[0].tag: ... t = p[0].normal_form ... elif 'NOUN' in p[0].tag: ... t...

Convert a String to binary sequence in C# with zero padding when not 8 bits of char

c#,string,character-encoding,char,ascii
I use this method which I saw in one of the questions to convert ascii to binary string: public string GetBits(string input) { StringBuilder sb = new StringBuilder(); foreach (byte b in ASCIIEncoding.UTF8.GetBytes(input)) { sb.Append(Convert.ToString(b, 2)); } return sb.ToString(); } But, If the input is something like the message bellow...

Encoding numbers

vb.net,character-encoding
I am a developer using high level languages. I usually take the lower level details for granted. I read that standards such as ASCII and Unicode are for character encodings. A character has to be stored as a number. Is this the same for numbers? For example, if I declare...

Handling count of characters with diacritics in R

r,unicode,character-encoding,nlp,linguistics
I'm trying to get the number of characters in strings with characters with diacritics, but I can't manage to get the right result. > x <- "n̥ala" > nchar(x) [1] 5 What I want to get is is 4, since n̥ should be considered one character (i.e. diacritics shouldn't be...

How encoding correctly for mailing with smtplib in Python

python,email,character-encoding
I'm trying to make a python script to verify some file and send an email if there is some change. The script is working good but if I wrote some special caracters, he won't send the mail... I've looked website for encoding but I can not solve my problem. Here...

Ruby on Rails - UTF8 encoding problems in MySQL from ActiveRecord

mysql,ruby-on-rails,utf-8,character-encoding
I have a webapp using Ruby 1.9 and Rails 4. In my local VM (ubuntu), everything's ok. My DB and tables are using utf8_unicode_ci, and data are well saved into the tables and well printed on webapp pages. My problem is on my production server (EB on AWS). I'm using...

Convert Access Attachment data type to file system files

sql-server,vba,ms-access,character-encoding,access-vba
I have a lot of files stored as attached files in an Access db. I am going to move data to an SQL server and for that purpose I need to extract the attached files and turn them into file system files. This snippet works fine for images and pdf...

What should the JCA deployment descriptor (ra.xml) character encoding be?

character-encoding,xml-parsing,findbugs,jca,deployment-descriptor
Looking through JCA 1.7 specification I could only find in one of their examples on the Resource Adapter Deployment Descriptor the following (Chapter 13: Message Inflow P 13-50): This example is showing the usage of UTF-8 encoding, however there is nothing saying if this was an optional selection for the...

How to understand text language in utf8 encoded text?

node.js,utf-8,character-encoding,redis,language-detection
Redis is using utf8 code and for my project I need to get text language which is utf8 encoded text. Is there any way that can give a clue about the language of the text? EDIT: My project is on NodeJs programming language. In Redis maybe lua script has a...

Why JDK8's Base64 uses ISO-8859-1?

character-encoding,base64,ascii,java-8,iso-8859-1
I'm writing my own BASE64 encoder/decoder for some constrained environments. And I found that Base64.Encoder#encodeString saying that it uses ISO-8859-1 for construct a String from those encoded bytes. I perfectly presuming that ISO-8859-1 charset also covers all base64 alphabets. Is there any possible reason not to use US-ASCII?...

Can I Write Version Information API For Both CHAR And WCHAR?

winapi,character-encoding,version
I'm a little bit short of reaching my goal. GetFileVersionInfoSize() is working fine along with other two functions GetFileVersionInfo() and VerQueryValue(). I would like to just add more features to it to make it complete. I've coded it to run on WCHAR and would like to know making it run...

Effects of Non-ASCII Characters in HTML vs HTML Encoded Characters

html,utf-8,fonts,character-encoding,non-ascii-chars
I had an issue earlier today where someone couldn't compile a static site due to some non-ASCII characters in a kramdown file. While writing a small script that finds these characters in our content, I ran across a large number of non-HTML encoded special characters. What are the implications in...

How do I print roman languages (e.g. Spanish) /special characters in Javascript?

javascript,character-encoding
I've done some research and turns out that to encode special characters we use encodeURI(component) and decodeURI. However when I try do something like: var my_special_char = 'ñ'; my_div.innerHTML = decodeURI(encodeURI(my_special_char)) A "question mark" is printed. I found this (non-complete) table about special characters: http://www.javascripter.net/faq/accentedcharacters.htm Effectively when I do decodeURI("%C3%B1");...

BeautifulSoup truncates table

python,character-encoding,web-scraping,beautifulsoup
I am attempting to write a Python script to process all the joyo kanji. However, my script is only getting the first 504 elements of the table. The full table has 2,136 elements. This script demonstrates the problem: from bs4 import BeautifulSoup from urllib2 import urlopen url = "http://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji" soup...

URL with Ukrainian characters giving UnicodeEncodeError

python,character-encoding,lxml
I'm trying to extract dictionary entry: url = 'http://www.lingvo.ua/uk/Interpret/uk-ru/вікно' # parsed_url = urlparse(url) # parameters = parse_qs(parsed_url.query) # url = parsed_url._replace(query=urlencode(parameters, doseq=True)).geturl() page = urllib.request.urlopen(url) pageWritten = page.read() pageReady = pageWritten.decode('utf-8') xmldata = lxml.html.document_fromstring(pageReady) text = xmldata.xpath(//div[@class="js-article-html g-card"]) either with commented lines on or off, it keeps getting an error:...

How to create a Persian file.txt and then explode it?

php,string,unicode,character-encoding,explode
I have a lot Persian text and I want explode it, I store my text in a file.txt. (So i have a file.text containing Persian text). Now my problem is charset. When i save the text into file.text, it give me a error: This file contains characters in Unicode format...

How to get the byte[] used to construct a String?

java,string,character-encoding,bytearray
I have some binary data that is being encoded as a UTF-8 string. How can I get the original data back from the string? The binary data is in no particular character encoding, so I'm not sure what conversion will give me what I want. Consider the following minimal example:...

Charset in Jsoup

html,character-encoding,jsoup
I use Jsoup library. After the execution of the following code: Document doc = new Document(language); File input = new File("filePath" + "filename.html"); PrintWriter writer = new PrintWriter(input, "UTF-8"); String contentType = "<%@ page contentType=\"text/html; charset=UTF-8\" %>"; doc.appendText(contentType); writer.write(doc.toString()); writer.flush(); writer.close(); In the output html file I receive the following...

PHP / MySQL: Certain characters not being encoded properly and appearing as question marks

php,mysql,encoding,utf-8,character-encoding
I am new to PHP and MySQL and hope someone can help me with this. I have a MySQL db with a table called "myTable". Both the server connection collation and the single columns containing text are set up with the data type "utf8_general_ci" and all characters appear correctly within...

UTF-8 encoding not working properly php

php,encoding,utf-8,character-encoding
When I try to print a string from a DB the spanish accent is turned into ? . Example de f?tbol, salones para actividades. ?Para m?s informaci?n I have tried following things added <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> Then I tried using utf8_encode() and utf8_decode() none of these things have worked....

How to find UTF-8 reference of a composite unicode character

unicode,encoding,utf-8,character-encoding
At work, i have this issue where i need to find the UTF-8 reference of a composite unicode character. The character in question is a "n" with a "^" on top : n̂. This is represented in unicode by the character "n" (U+006E) followed by the circumflex accent (U+0302). What...

Handle windows-1252 and unicode in java [closed]

java,unicode,utf-8,character-encoding,bytearray
After a http request, I have got a byte array encoded with utf-8, e.g.: byte[] array = new byte[]{0xc3, 0xa4, 0xc2, 0x96} I decode the byte array using new String(array, "UTF-8"). In the example the first decoded char is 0xe4 which represents the letter ä in Unicode – so far...

Why is this character 口 causing my scanner to fail?

java,character-encoding
I'm using the Java Scanner. I have a .txt file with this text saved in it. PriceDB = { ["profileKeys"] = { ["Name - 回音山"] = "Name - 回音山", }, ["char"] = { ["Name - 回音山"] = { ["CurrentValue"] = "一口价:|cffffffff70,197|TInterface\\MoneyFrame\\UI-GoldIcon:0:0:2:0|t|r", }, }, } All I am trying to do...

SQL code copy / pasted from Excel document,with no visible errors, will not execute

sql-server,excel,tsql,character-encoding
I am using the code from this website (up until the end of the first case statement)- http://www.sqlusa.com/bestpractices2005/abbreviation/ DECLARE @StateCode char(2) SET @StateCode = 'TX' SELECT 'State name from state code' = CASE WHEN @StateCode = 'AK' THEN 'ALASKA' WHEN @StateCode = 'AL' THEN 'ALABAMA'... ...WHEN @StateCode = 'WV' THEN...

Erroneous encoding on form to Spring MVC

java,spring,spring-mvc,utf-8,character-encoding
I'm receiving form data in my Spring MVC controller, but when I try to input non-ASCII characters I receive rubbish, áéíóú gets converted into áéíóú. I'm using <%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> in the jsp pages, Tomcat is configured to accept UTF-8 in the URI/Connection part and the form...

Python Encoding: Open/Read Image File, Decode Image, RE-Encode Image

python,image,encoding,character-encoding
Note: I don't know much about Encoding / Decoding, but after I ran into this problem, those words are now complete jargon to me. Question: I'm a little confused here. I was playing around with encoding/decoding images, to store an image as a TextField in a django model, looking around...

Text encoding - fine on Windows, not nix

linux,r,character-encoding,stringi
I have an issue with loading data between default encoding on Win and nix machines (ISO-8859-1 and UTF-8 respectively). Example - Windows first: library(stringi) dummy <- as.character("BØÅS") write(dummy, "saveFile") getData <- read.table("saveFile", header=F, sep="\t", quote="\"") reEncode=function(x) { stri_trans_general(x, "Latin-ASCII") } enCoded <- apply(getData, 1, reEncode) result <- as.data.frame(enCoded) In Windows...

Issue with Encoding base64 in PHP and Decoding base64 in Java

java,encoding,character-encoding,decode,string-decoding
A string-"gACA" encoded in PHP using base64. Now I'm trying to decode in java using base64. But getting absurd value after decoding. I have tried like this: public class DecodeString{ { public static void main(String args[]){ String strEncode = "gACA"; //gACA is encoded string in PHP byte byteEncode[] = com.sun.org.apache.xerces.internal.impl.dv.util.Base64.decode(strEncode...

POST SHA512 Encrypted String as HTTP Header (in C#)

c#,hash,character-encoding,http-headers,sha512
I've been coming against this for a few hours now and I can't seem to find a solution. I'm trying to encrypt a string using SHA512 and put it in a header for HTTP Request. I know that HTTP Headers only like ASCII characters but everytime I generate the encrypted...

Override incorrect declared text encoding in XML document using XMLHttpRequest

internet-explorer,character-encoding,xmlhttprequest,windows-1252
Our app is receiving data from a source with incorrect XML headers. Although the workaround in that post works (inserting an nginx proxy), we'd like to find a client-side solution, if there is one. So, is there a way to intercept an XML document and force the document to be...

Why does opening a file in two different encodings work as expected?

python,python-3.x,encoding,character-encoding,text-files
Quoting from here, The default encoding is platform-dependent, so this code might work on your computer (if your default encoding is utf-8), but then it will fail when you distribute it to someone else (whose default encoding is different, like CP-1252). Code mentioned in the above quote: fp = open('text.txt')...

Weird charset when inserting into mdb database using php

php,asp.net,ms-access,character-encoding
I have a web application I wrote using ASP.NET couple years ago, it saves form data into mdb database and then convert it to XLS file when the user chooses to. I changed the form handling code into php and I noticed that arabic characters are converted to a different...

Wrong filename when using chinese characters

c++,string,character-encoding,char
I'm trying to create a file on Windows using a Chinese character. The entire path is inside the variable "std::string originalPath", however, I have a charset problem that I simply cannot understand to overcome. I have written the following code: #include <iostream> #include <boost/locale.hpp> #include <boost/filesystem/fstream.hpp> #include <windows.h> int main(...

How to set charset in mssql_connect?

php,pdo,utf-8,character-encoding
I'm using PHP 5.3 on FreeBSD. I want to select some utf-8 data (persian characters) from SQL Server database, and write them to a text file, but because of utf-8 does all the characters will be written to file ????. I have searched for two days and read most of...

Encoding Issue Causing MalformedByteSequenceException in Xerces UTF8Reader

java,xml,character-encoding,xml-parsing,xerces
I'm encountering com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException with an XML file. I stepped through the Xerces code with a debugger and narrowed down the area where this was ocurring. I was able to determine that by removing the "smart quote" characters in the document, the document becomes parseable. The document came with no DTD....

escape accented chars on utf-8 json

c#,json,character-encoding,json.net
the code below produce this output: {"x": "Art. 120 - Incapacità di intendere o di volere"} i need to change to this, i suppose i've to change something on encoding but i don't know what: {"x": "Art. 120 - Incapacit\u00e0 di intendere o di volere"} code: string label = "Art....

PHP: unreadable output characters

php,mysql,pdo,character-encoding
I am working on a page with PHP but the data (text) I stored in my database while I retrieve it it is showing unreadable characters as you can see from my online instance. http://www.taleemulislam-radio.com/test.php and my PHP code for that page is: <?php require_once "pdo.php"; try { $sql =...

HTML: How to initiate HTML document header

html,character-encoding,doctype,head
I am still pretty new to HTML and programming in general so this is more of a curiosity question but I am asking as I want to use it the right way. Whenever I have to initiate an HTML document I start it as below and never observed any issues....

How to encode English plain-text (consisting only of letters a-z and whitespace) using a 5-bit character encoding in Python?

python,character-encoding
in Python, is there any way to encode English plain-text (consisting only of small letters a-z and whitespace - i.e. a total of 27 characters) using 5-bit character-encoding? If yes, please do tell me how. To be more specific, say I have a string: s="hello world". After encoding this using...

'utf8' codec can't decode byte 0xf3

python,json,character-encoding
I am using python 2.7 to read the json file. my code is import json from json import JSONDecoder import os path=os.path.dirname(os.path.abspath(__file__))+'/json' print path for root, dirs, files in os.walk(os.path.dirname(path+'/json')): for f in files: if f.lower().endswith((".json")): fp=open(root + '/'+f) data = fp.read() print data.decode('utf-8') But I got the following error....

Why can't I read data from file where i wrote text. C

c,character-encoding
I'm new in C so i got confused a little bit. Here is my code: #include <stdio.h> #include <string.h> #include <fcntl.h> #include <sys/stat.h> #include <stdlib.h> #define FILENAME "/var/note" int main(int argc, char *argv[]) { int userid = getuid(); int fd = open(FILENAME, O_WRONLY|O_CREAT|O_APPEND, S_IRUSR|S_IWUSR); write(fd, &userid, 4); } Basically I'm...

R encoding UTF-8: U+0080-U+009F

r,utf-8,character-encoding
I am struggling with some encoding issues. I have many textfiles that contain rows in the following format: https://dl.dropboxusercontent.com/u/94114397/example.txt According to Notepad++, these are all encoded in UTF-8 and most non-ASCII characters are displayed correctly, as you can see in lines 1 and 2. However, I have problems with some...

ANSI vs UTF-8 in web Browser

javascript,html,utf-8,character-encoding,ansi
My requirement is to allow users to use(type) ANSI characters instead of utf-8 when they are typing in to the text fields of my webpages. I looked at the setting of the character set in html meta tag <meta charset="ISO-8859-1"> That was helpful to display the content in ANSI instead...

What encoding to use when interpreting HTTP/1.1 header field value

http,character-encoding,http-headers
In HTTP/1.1 specs I get this when it comes to define headers: message-header = field-name ":" [ field-value ] [...] field-value = *( field-content | LWS ) field-contet = <the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string> and the definition...

java.nio.charset.MalformedInputException: Input length = 1

java,io,character-encoding,malformed
I have this (stripped the HTML tags for the code example) function that builds a HTML table out of a CSV, but I get a runtime error everytime I try to run it and I don't know why. Google says that maybe something with the encoding is wrong but I...

XMLHttpRequest returns wrongly encoded characters

javascript,pdf,utf-8,character-encoding,xmlhttprequest
I use XMLHttpRequest to read the PDF document http://www.virtualmechanics.com/support/tutorials-spinner/Simple2.pdf %PDF-1.3 %âãÏÓ [...] and print its content out to console: var xhr = new XMLHttpRequest(); xhr.onreadystatechange = function() { if (xhr.readyState === 4 && xhr.status === 200) { console.log(xhr.responseText); console.log('âãÏÓ'); } }; xhr.open('GET', 'http://www.virtualmechanics.com/support/tutorials-spinner/Simple2.pdf', true); xhr.send(); However, the console says %PDF-1.3...

Which encoding should I use for my Android Studio project?

android,android-studio,gradle,character-encoding
I had a problem similar to this one: Android Studio 1.2 - Project encoding mismatches by default The Android Studio project & gradle encoding is now set to windows-1252 instead of UTF-8. Is there any reason I should be using one of these over the other?...

Chrome browser not displaying UTF-8 characters

html,twitter-bootstrap,unicode,utf-8,character-encoding
I've searched stackoverflow and other websites, but I couldn't find anything useful to my case, so I decided to create a new question hoping that someone has an answer to my problem. My problem is with Chrome browser only, I'm creating a simple bootstrap website located here: http://www.farahfa.com/rikki On IE...

Get index of first non standard english character

c#,linq,character-encoding,globalization,diacritics
I'm trying to process a string and separate it into two parts when i find a character that is not of the standard english alphabet. For example This is a stríng with áccents. and i need to know the index of the first or every character with accent (í). I...

unreadable quickcheck log file after a test routine

haskell,character-encoding,cabal,quickcheck
I made a test routine for a Haskell program with quickcheck. I declared it in my cabal file with : Test-Suite routine_de_test Type: exitcode-stdio-1.0 Hs-Source-Dirs: test Main-is: Tests.hs and launched it with : cabal configure --enable-tests cabal buil cabal test The tests are processed correctly and I was expecting to...

How to convert base64 string to image binary file and save onto server [duplicate]

c#,asp.net,canvas,file-upload,character-encoding
This question already has an answer here: C# Base64 String to JPEG Image 3 answers As an example I have converted a canvas element with a re-sized image and posted into a hidden input field that's now encoded as data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD... This value then posted to the same page which...

Processing special characters

string,character-encoding,lua
Let's say I receive the following string in Lua mÜ⌠⌠í∩and would like to apply it to my current processing code, which is the following function inTable(tbl, item) for key, value in pairs(tbl) do if value == item then return true end end return false end function processstring(instr) finmsg = ""...

Add an attribute to an object with different kind of accent in Angular

javascript,angularjs,encoding,character-encoding,escaping
my usecase is: I have a ng-model "myModel". I'm pulling the html for a form via ajax (because I have dynamic fields) and in it, the fields are described with: <input ng-model="myModel.myField" name="myField"> This works perfectly except when the name of the field has an accent in it (it is...

Ruby 2: Detect encoding from binary ASCII-8BIT data

ruby,encoding,utf-8,character-encoding,ascii-8bit
I have to load some data from external sources. When I look at the encoding, Ruby tells me ASCII-8BIT, binary file. However, some of the sources are encoded ISO-8859-1 and some of them are in UTF-8. When I try to convert the ISO-8859-1 encoded stuff to UTF-8, I get an...

How to decode string in specific charset and then encode it in UTF-8?

character-encoding,ada
How, in Ada, do I decode a string coming from the MS Windows terminal, then encode it in UTF-8?

MySQL sql dump wrong char encoding

mysql,wordpress,character-encoding
We did a wordpress sql backup into sql dump, I can see that the CHARSET is set to utf8 but for some reason all the non english text shows up like this: ╫ó╫¿╫¢╫ץ╫¬ ╫ó╫ש╫ª╫ץ╫ס Is this something we can fix? which encoding is it?...

Translate encoding of string

clojure,character-encoding,decode,windows-1252
I have a string that is in Windows-1252 encoding, but needs to be converted to UTF-8. This is for a program that fixes a UTF-8 file that has fields containing Russian text encoded in quoted-printable Windows-1252. Here's the code that decodes the quoted-printable: (defn reencode [line] (str/replace line #"=([0-9A-Fa-f]{2})=([0-9A-Fa-f]{2})" (fn...

phpmyadmin collation: latin and cyrillic

mysql,character-encoding,collation
Could someone tell me please how to configure MySQL DB in phpmyadmin for storing both latin and cyrillic data sets in the same table? Thanks!

Why does Python String concatenation work with Russian text but string.format() does not

python,csv,character-encoding,windows-1251
I'm trying to parse (and escape) rows of a CSV file that is stored in Windows-1251 character encoding. Using this excellent answer to deal with this encoding I've ended up with this one line to test the output, for some reason this works: print(row[0]+','+row[1]) Outputting: Тяжелый Уборщик Обязанности,1 литр While...

MySQL - SQL not searching for UTF 2019 (’) single quote

php,mysql,wordpress,character-encoding,phpmyadmin
I have the following character in one of my row entry in a MySQL database:- http://www.fileformat.info/info/unicode/char/2019/index.htm When I run the following sql query in phpmyadmin it says that no results are found. SQL:- SELECT * FROM `wp_posts` WHERE `post_title` LIKE '%ABC, D’EF%' I've copy pasted the title from phpmyadmin as:-...

Fast conversion of String to byte[] using UTF-16LE encoding

java,android,optimization,encoding,character-encoding
I need to get bytes of millions of string using this : String str="blablabla...."; // some UTF-16LE encoding string extracted from DB bytes=str.getBytes("UTF-16LE") But this is awfully slow. There are custom fast versions of getBytes but they don't support UTF-16LE. For example this is one of them: // http://stackoverflow.com/questions/12239993/why-is-the-native-string-getbytes-method-slower-than-the-custom-implemented-getb private...

How to prevent weird characters from showing up in web pages

php,mysql,character-encoding
Often when outputting strings from a database onto a webpage, special characters get displayed as some other weird characters (in my example, an em-dash gets turned into –). How do I prevent this? I'm using PHP + MySQL, and I'm not using any frameworks. I'm guessing this is caused by...

Char encoding and SQL in C#

c#,sql-server,character-encoding,collation
I have this sql: select productid from products where productcode = @code and @code is a parameter, it's value was ABCÊ but it matched ABC debugging it in visual studio showed � in the quickwatch. the database has Latin1_General_CI_AS as collation. The field type in the database is an nvarchar(50)...

Encoding and then decoding json with non-ascii characters in python 2.7

python,json,character-encoding
I have a python application that encodes some objects to json, passes the json string to another program, and then reads in a possibly modified version of that json string. I need to check that what's changed with the json encoded objects. However, I'm having trouble with re-encoding non-ascii characters....

Python: Unicode to html entities

python,unicode,encoding,character-encoding
I am currently with problems to convert from unicode to html entities. Here is my currently code: >> name = u'\xc3\xa1\xc3\xa1\xc3\xa1\xc3\xa1' >> entities = name.encode('ascii', 'xmlcharrefreplace') >> print str(entities) &#195;&#161;&#195;&#161;&#195;&#161;&#195;&#161; Each \xc3\xa1 = á (multibyte character), but when I convert it to entities, I got 2 entities for a single...

How do I compare each character of a String while accounting for characters with length > 1?

java,string,unicode,character-encoding,utf-16
I have a variable string that might contain any unicode character. One of these unicode characters is the han 𩸽. The thing is that this "han" character has "𩸽".length() == 2 but is written in the string as a single character. Considering the code below, how would I iterate over...

How to display special chars in html [duplicate]

php,html5,encoding,character-encoding
This question already has an answer here: UTF-8 all the way through 14 answers I have a web page and suddenly it is displaying spcecial characters as asian characters like ñ is display as 帽 this is the head of my template <!DOCTYPE html> <html lang="es"> <head> <meta charset="utf-8">...

What is a code point and code space?

character-encoding,character,ascii
I was reading the Wikipedia article on code points, but not sure if I understand correctly. For example, the character encoding scheme ASCII comprises 128 code points in the range 0hex to 7Fhex So is 0hex a code point? Also could not find anything on code space. PS. If it's...

How to read a text file in ancient encoding?

character-encoding
There is a public project called Moby containing several word lists. Some files contain European alphabets symbols and were created in pre-Unicode time. Readme, dated 1993, reads: "Foreign words commonly used in English usually include their diacritical marks, for example, the acute accent e is denoted by ASCII 142." Wikipedia...

How to output encode a hidden field in jsp using spring escapeBody tag

spring,jsp,character-encoding
I am trying to use the <spring:escapeBody> for a hidden field. <input type="hidden" name="displayId" id="displayId" value="${displayIdValue}"/> I tried adding the <spring:escapeBody> around the value field, as follows: <input type="hidden" name="displayId" id="displayId" value=<spring:escapeBody>"${displayIdValue}"</spring:escapeBody>/> but it gives a syntax error. what is the right syntax. I have looked online but couldn't find...

Wrong character encoding in simple Haskell code

haskell,utf-8,character-encoding
I have problems with character encoding in haskell. This simple program write wrong results. What I am really interested here is encode function which forces me to use ByteString. Application is: import Data.ByteString.Char8 (unpack, pack) import Data.ByteString.Lazy (toStrict) import Data.Csv (encode) -- cabal install cassava main = do -- (middle...

PHP DomDocument's getElementsByTagName changing encoding

php,xml,character-encoding,domdocument
I'm trying to parse a XML and I am facing some problems with PHP DOMDocument::getElementsByTagName that I never had before. This is my code to get the XML $xmlToParse = new MXML('1,0','UTF-8'); $xmlToParse->loadXML(<myXML>); //print_r ($xmlToParse->saveXML());die(); When I use the print_r, this is the result (part of): <Hotel> <HotelId>10333</HotelId> <HotelName>Papillon Hotel</HotelName>...

impossible to replace the character with gsub

ruby-on-rails,ruby,character-encoding
we import a file and in the file there for special crater at first I try encoding methods and decoding ruby and nothing happens, so I used gsub. work it moist as the character group ̩ and ̤ For the rest no problem replaces me. here method replace def replace_chars(name)...

� characters importing CSV file

php,character-encoding
UPDATE: After checking if Ajax was sending the CSV data with the correct characters, I confirm you it does, it is on the PHP side where those get wrong: I am importing a CSV file with spanish characters (á, ñ, ó, etc...). After importing, words like "Germán" transform to "Germ�n"....

Set charset only to part of html

html,character-encoding
I have html widget that loads on other people's websites where default encoding can be other than UTF-8. How can I ensure that my widget will be rendered as UTF-8 (It's not an Iframe)? Can I set charset only to my "div block"?

Why/how does the browser decide ☃.net goes to xn--n3h.net

url,browser,unicode,character-encoding,iri
If we type into firefox or chrome http://☃.net/ It takes us to http://xn--n3h.net/ Which is a mirror of unicodesnowmanforyou.com What I don't understand is by what rules the unicode snowman can decode to xn--n3h, it doesn't look anything like utf-8 or urlencoding. I think I found a hint while mucking...

What is the difference between utf8mb4 and utf8 charsets in mysql?

mysql,encoding,utf-8,character-encoding,utf8mb4
What is the difference between utf8mb4 and utf8 charsets in mysql? I already know about ASCII, UTF-8, UTF-16 and UTF-32 encodings; but I'm curious to know whats the difference of 'utf8mb4' group of encodings with other encoding types defined in mysql server. Are there any special benefits/proposes of using utf8mb4...

Can't display german umlaut

php,mysql,character-encoding,str-replace
My main problem right now is that I can't display "ä, ü, ö" on my webpage, the get displayed as �, when I display my script with "echo" they seem normal, when I run it included from my html page I get �. After I tried several things I've read...

ActiveRecord - illegal mix of collations

mysql,ruby-on-rails,ruby,activerecord,character-encoding
I am having some issue with Ruby's ActiveRecord with the Redmine application. Started PATCH "//issues/33135" for [ipaddress] at 2015-06-02 17:02:48 -0700 Processing by IssuesController#update as HTML Parameters: {"utf8"=>"✓", "authenticity_token"=>"[secret_token]", "issue"=>{"is_private"=>"0", "project_id"=>"949", "tracker_id"=>"4", "subject"=>"adgsasdg", "description"=>"asdggsad", "status_id"=>"1", "priority_id"=>"1", "assigned_to_id"=>"", "parent_issue_id"=>"", "start_date"=>"2015-06-02",...

Reading file containing Arabic language

c#,file,encoding,printing,character-encoding
I have a file that contains both Arabic and English word/letters/numbers. I'm trying to print the file using the code from Here. When I open the file in notepad, I see all the funny and unprintable chars. When I save the same file as Unicode in Save as... file menu...

Encode\escape string for .NET [duplicate]

.net,encoding,character-encoding,escaping
This question already has an answer here: Can I convert a C# string value to an escaped string literal 11 answers I have a string of assorted white-space: e.g. "\t\r " I need to display it in an error message, but just showing white-space isn't helpful. Is there a...

Web API action returns FileContentResult that, if saved as .csv, will open as gibberish , while if as .txt, is ok. Why?

c#,excel,csv,utf-8,character-encoding
I am exporting a file via a http get response, using ASP.NET Web API. For that, I am returning a FileContentResult object, as in: return File(Encoding.UTF8.GetBytes(fileContents.ToString()), "text/plain; charset=UTF-8"); After several minutes stucked with encoding issues, I use google's Advanced REST Client to perform the get to the web api controller's...

bottle_mysql Encoding failure

python,mysql,utf-8,character-encoding,bottle
My implementation goes like this : Centos Server, MySQL with Rails server and currently working on a new bottle application on that. I have a database that i want to share the date in both Rails and Bottle app. Some data in my DB are in greek. #!/usr/bin/env python #...

Can we convert ANSI encoded CSV file to utf-8 encoded file with javascript?

javascript,encoding,character-encoding
I have been looking answer for this question for days. Actually, I need to upload the ansi encoded csv file to web service. The front end application reads the file encode it's content to base64 and send it to web service in json format. But when I try to read...

String encoding - Shift_JIS / UTF-8

java,android,character-encoding
I get a string from a 3rd party library, which is not well encoded. Unfortunately I'm not allowed to change the library or use another one... So the actual problem is, that the 3rd party library result string will encode characters like "è ò à ù ì ä ö ü,...

encoding filenames in zip archive to correct displaying in windows explorer

javascript,character-encoding,zip,windows-explorer
On our site we generate zip archives on client side with jszip library. Files in this archive has nonASCII (cyrillic) characters in filenames. If use 7zip filenames displaying correctly. But some users of our site opened this zip archives with windows explorer and in this case file names displayed incorrectly....

RStudio character encoding issue: quotation marks replaced by \x92

r,character-encoding,rstudio,rstudio-server
I am reading.csv a file containing some naturally occurring text. Sometimes in the text, ' is used to serve as an apostrophe, sometimes ’ is used instead (see lines 2 and 6 of this table). When reading the file in RStudio on my laptop, I have no issue (both '...

Which encoding replaces “í” with “\303 \255”?

c#,utf-8,character-encoding,data-conversion
Anyone knows which encoding is this one. They tell me this is UTF8 but I can't see how. This input: aquí (notice the accent on the i) shoud produce this: aqu\303 \255 Seems this is based on this table https://www.acc.umu.se/~saasha/charsets/, but I can see how I can get the output...

Modelica encoding problems

utf-8,character-encoding,modelica,dymola,openmodelica
Since Modelica 3.2 (released March 2010) it is allowed to use arbitrary Unicode characters in comments, description strings and/or annotations. But for some reason I am having trouble with code like the following: within ; model ENCO_testing "code for investigating Dymola encoding problems" Real TempC "Temperature in °C"; parameter Real...

Are there any unicode/wide chars that encode to multiple encoded characters

c,character-encoding,internationalization
Consider wctomb(), which takes a wide character and encodes to the currently selected character set. The glibc man page states that the output buffer should be MB_CUR_MAX, while the FreeBSD man page states the output buffer size should be MB_LEN_MAX. Which is correct here? Are there any example wide char/encoding...

Laravel 5 charset not working correctly on the views. But it working well when I dump it from controller

php,laravel,utf-8,character-encoding,laravel-5
I'm facing a charset problem here. I'm developing an app that uses a sql server database. The database was not created for this app, it exists before it and works very well. I can't change anything on the database because its too large and its used by many other apps....

My flat files should be UCS-2, but I can't import into MySQL database

mysql,character-encoding
I have twenty pipe-delimited text files that I would like to convert into a MySQL database. The manual that came with the data say Owing to the difficulty of displaying data for characters outside of standard Latin Character Sets, all data is displayed using Unicode (UCS-2) character encoding. All CSV...

All special characters are question marks in PHP/HTML

php,character-encoding,special-characters
Php default character set is UTF-8. All special characters in PHP and HTML are outputting as question mark like "?" in the browser. All data with special characters are stored as UTF-8 in database fields. But when PHP reads the database and output to browsers, all special characters like copyright...