FAQ Database Discussion Community


Inserting UTF-8 data into SQL Server 2012

javascript,sql-server,utf-8
I have an SQL Server 2012 DB where a table contains 2 columns of UTF-8 data (non-English. Tamil language to be specific). The data type that I set for those columns are nvarchar(500). I get input (the UTF-8 data) from a jsp page (input type as text). I can insert...

Utf8 accents in nodejs

javascript,node.js,utf-8
i tried to read a vcf file with utf8 encoding, with the result is: { "name": "=4A=61=76=69=65=72=20=4C=75=6A=C3=A1=6E", "tel": "2814682382" }, the problem are accents chars, for example áéíóú. How to convert name into a valid utf-8 string?. In the above example the string must be Javier Luján...

Export (Android/Java) string data in with extended characters for import into Excel

java,android,excel,unicode,utf-8
I need to export string data that includes the 'degrees' symbol ("\u00B0"). This data is exported as a csv text file with UTF-8 encoding. As would be expected, the degrees symbol is encoded as two characters (0xC2, 0xB0) within the java (unicode) string. When the CSV file is imported into...

How to find condition that start char in UTF-8 file is read, using FileStream and StreamReader?

c#-4.0,utf-8,io,stream
In C# .NET 4.0 (really 4.5.2), my code reads a UTF-8 file. FileStream fstream = new FileStream(path, FileMode.Open); BufferedStream stream = new BufferedStream(fstream); using (StreamReader reader = new StreamReader(stream, new UTF8Encoding())) { int i; while((i = reader.Read()) > -1) { //a guess at a condition that is true I.F.F. reader...

send and retrive arabic data from mysql database

php,android,mysql,utf-8,arabic
I want to bring Arabic data from mysql database I wrote the php code but it gives me ????? on arabic data any help to make it wokrs ?? <?php header('Content-Type: charset=utf-8'); $link=mysqli_connect("localhost","root","","arabicd"); mysql_set_charset('utf8',$link); if (mysqli_connect_errno($link)) { echo "Failed to connect to MySQL: " . mysqli_connect_error(); } mysql_query("SET character_set_results =...

Detecting corrupt characters in UTF-8 encoded text file

regex,encoding,awk,utf-8,scripting
I have a text file that was edited with the wrong character encoding and thus has some mojibake and corrupt characters in some of the strings when I open it using UTF-8. What scripting language would be the most efficient at detecting these corrupt characters? Perl is not an option....

Return a csv encoded in UTF-8 with BOM from django

django,csv,utf-8
I'm trying to output a CSV file that the user could open with excel. I've encoded all string in UTF-8 but when I opened the file with excel I see jibrish. Only after converting the file to UTF-8 with BOM (using notepad++ on windows) I was able to display the...

Html2canvas image capturing issue with UTF-8 characters

javascript,html5,canvas,utf-8
I want to capture my webpage, In order to this I find html2canvas, when I use as shown below ,my UTF-8 (persian) characters get in trouble and this direction destroyed as you see. HTML: <div id="wrapper"> <span>این کاراکتر ها بهم میریزند</span> </div> JavaScript: $(document).ready(function() { html2canvas($("#wrapper"), { onrendered: function (canvas)...

unicode converting in RestTemplate in Spring

java,spring,unicode,utf-8
My aim is getting user info by accessToken using facebook api.I get response but email in this response is like this: aaaaaa\u0040mail.com. For converting i add some properties but this doesn't work RestTemplate restTemplate = new RestTemplate(); restTemplate.getMessageConverters().add(0, new StringHttpMessageConverter(Charset.forName("UTF-8"))); String facebook = restTemplate.getForObject( "https://graph.facebook.com/me?access_token=" + facebookAccessToken, String.class); How can...

How to process a file in correct encoding in powershell?

mysql,powershell,encoding,utf-8,cmd
I have a MySQL's SQL file which is encoded in UTF8 called data.sql (which is produced by the mysqldump command). If I run the following in a cmd.exe console the file is correctly processed. mysql --defaults-file=mysql.update2.ini --batch --raw --database=test1 --default-character-set=utf8 < "data.sql" If I run the following in a powershell...

How to display Arabic unicode text in page that retrieved from database

java,unicode,utf-8,xhtml,arabic
I need your help in displaying some Arabic text which is stored in a variable in the xhtml page. I have configured my project in jdeveloper to include UTF-8 in the properties and the Arabic text is displayed correctly. I have a variable called bankName and it has the unicode...

Read utf-8 character from byte stream

python-3.x,utf-8,utf8-decode
Given a stream of bytes (generator, file, etc.) how can I read a single utf-8 encoded character? This operation must consume the bytes of that character from the stream. This operation must not consume any bytes of the stream that exceed the first character. This operation should succeed on any...

cut off last rune in UTF string

utf-8,go
How to cut off last rune in UTF string? This method is obviously incorrect: package main import ("fmt" "unicode/utf8") func main() { string := "你好" length := utf8.RuneCountInString(string) // how to cut off last rune in UTF string? // this method is obviously incorrect: withoutLastRune := string[0:length-1] fmt.Println(withoutLastRune) } Playground...

Erroneous encoding on form to Spring MVC

java,spring,spring-mvc,utf-8,character-encoding
I'm receiving form data in my Spring MVC controller, but when I try to input non-ASCII characters I receive rubbish, áéíóú gets converted into áéíóú. I'm using <%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> in the jsp pages, Tomcat is configured to accept UTF-8 in the URI/Connection part and the form...

store Hebrew in database with utf- 8 encoding using php and mysql

html,mysql,internet-explorer,encoding,utf-8
I am building a web site with database in Hebrew (using php and mysql). on general I use chrome to test my job... today I tried explorer :-0 (yeh... you must be thinking "why the f*** would he do it... ;) ). I found out that all the data that...

Delphi - converting string back from UTF-8

osx,delphi,utf-8
I am having a problem converting a UTF-8 encoded string back into something usable by delphi. The application is written in XE8 and is being deployed on windows and OSX. The application uses the LimeLM API dll and dylib libraries on windows and OSX respectively. Everything works fine on windows,...

Include Unicode Signature (BOM) in HTML files or not?

html,utf-8,byte-order-mark
In Dreamweaver I have the option "Include Unicode Signature (BOM)". If I check this box and save the file the HTML file it looks good when viewed in the web browser. If not it gives me strange symbols for Swedish letters like åäö. If I serve this HTML file with...

one eclipse install not displaying utf8

java,eclipse,svn,utf-8
I have Eclipse installed on an old machine and a new machine. I download from svn repository on old Eclipse, UTF8 character in a java file displays fine. I download onto new Eclipse, UTF8 character displays as "?". I deduce that the file is saved correctly in SVN. I have...

Which encoding replaces “í” with “\303 \255”?

c#,utf-8,character-encoding,data-conversion
Anyone knows which encoding is this one. They tell me this is UTF8 but I can't see how. This input: aquí (notice the accent on the i) shoud produce this: aqu\303 \255 Seems this is based on this table https://www.acc.umu.se/~saasha/charsets/, but I can see how I can get the output...

C# - Create .txt file as UTF-8 instead ANSI from plaintext

c#,utf-8
I input the UTF-8 charachers example 正體字/繁體字 in plaintext. i click the button to save the text as .txt file but when i click the button to load the .txt file to the program again, the question marks ?????? appear in plaintext load .txt file richTextBox1.SaveFile("notes.txt", RichTextBoxStreamType.PlainText); save the plaintext...

What -C flag number in perl makes UTF-8 “just work”?

perl,utf-8,utf8-decode
My setup: perl-5.20.2, UTF-8 environment. Consider the following two bash examples. The first one works OK, the second doesn't. echo -n 'привет мир' | perl -MEncode -le '$a=decode("utf8",<>); $x=decode("utf8","мир"); print encode("utf8",sprintf("% 11s",$a)) if $a=~/$x/'|grep -q ' привет мир' && echo OK for (( i=0; $i < 512; i=$((i+1)) )); do...

How to read utf-8 encoded XML file in PHP?

php,xml,utf-8
Code was working fine & was getting values from XML file and was generating form of those values . I added encoding="UTF-8" to XML tag. And It is giving following warnings and error. Thank you in adavance Warning: DOMDocument::load(): parsing XML declaration: '?>' expected in file:///C:/xampp/htdocs/Urdu/english.xml, line: 1 in C:\...

PHP strings have same encoding (UTF8) and appear as identical in browser but are not equal

php,string,curl,utf-8,comparison
so I need to compare to strings (1 is a result from a CURL call to a remote URL, using charset UTF8) and the other is hardcoded in my script (utf8 as well). The strings look the same but when I compare them using strcmp(), the result is -44. I...

Send Utf8(persian) to Server By HttpURLConnection

android,utf-8,httpurlconnection
I want Send my information to server But I Can't Send Persian(Farsi) Character Please Help Me.. Give Me A Sample Code For it...

How to make the output from Text::CSV utf8?

perl,csv,encoding,utf-8
I have a CSV file, say win.csv, whose text is encoded in windows-1252. First I use iconv to make it in utf8. $iconv -o test.csv -f windows-1252 -t utf-8 win.csv Then I read the converted CSV file with the following Perl script (utfcsv.pl). #!/usr/bin/perl use utf8; use Text::CSV; use Encode::Detect::Detector;...

Laravel 5 charset not working correctly on the views. But it working well when I dump it from controller

php,laravel,utf-8,character-encoding,laravel-5
I'm facing a charset problem here. I'm developing an app that uses a sql server database. The database was not created for this app, it exists before it and works very well. I can't change anything on the database because its too large and its used by many other apps....

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)

python,api,utf-8
i have taken this code from https://github.com/davidadamojr/TextRank and i am facing this problem. i tried to solve by placing utf-8 in "keyphrases = decode('utf-8').extractKeyphrases(text)" but failed. here is the code: """ From this paper: http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Mihalcea.pdf External dependencies: nltk, numpy, networkx Based on https://gist.github.com/voidfiles/1646117 """ import nltk import itertools from operator...

echo HINDI language word stored in an array in PHP script

php,utf-8,echo,meta,hindi
I am trying to echo hindi script words stored in an array. The output is getting displayed when i use print_r() but not when i use echo. The code is below. what changes do I need to make? <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> <?php $base_arr=array("सूर्योदय","कँगन","ख...

Delete weird ANSI character and convert accented ones using Python

python,encoding,utf-8,ansi
I've downloaded a bunch of Spanish tweets using the Twitter API, but some of them have strange ANSI characters that I don't want there. I have around 18000 files and I want to remove those characters. I have all my files encoded as UTF-8. For example: b'Me quedo con una...

Convert byte array from utf-16 to utf-8

c++,utf-8,utf-16
I have a byte array uint8_t array[] = {0x00, 0x72, 0x00, 0x6f, 0x00, 0x6f, 0x00, 0x74}; I know, that in text this is "root"; I have a function that should convert utf-16 to utf-8. Here is the code: inline bool convertUcs2ToUtf8(const std::vector<char> &from, std::string* const to) { return ucnvConvert("UTF-16", "UTF-8",...

ODBC php Excel Encoding Issues

php,excel,encoding,utf-8,odbc
I'm making an ODBC connection with php to an Excel file. However when I encounter characters like "~", "^", "´" they appear messed up as this: Mês -> M?s Formação -> Forma??o I'm doing the following utf-8 treatment just to get the column names: $con = odbc_connect($odbcName, $dbUser, $dbPassword) or...

PHP / MySQL: Certain characters not being encoded properly and appearing as question marks

php,mysql,encoding,utf-8,character-encoding
I am new to PHP and MySQL and hope someone can help me with this. I have a MySQL db with a table called "myTable". Both the server connection collation and the single columns containing text are set up with the data type "utf8_general_ci" and all characters appear correctly within...

download xml file from url using winhttp in excel - CHARSET=UTF-8

php,excel-vba,utf-8,winhttp
I am trying to automate downloading of xml file from a url. Even after multiple attempts I couldn't think of possible solution to this and wonder if anyone could help me by just looking at the output and response header, url being sensitive. Using the excel I am doing a...

Remove UTF-8 substring in sqlite

sqlite,utf-8
I am trying to remove some invisible characters from a table. I tried this query: UPDATE table SET text = REPLACE(text, x'202B', '' ) with no luck. I also tried selecting it using: SELECT REPLACE(text, x'202B', '@@@@') AS text FROM table but nothing is replaced, so I'm guessing that it...

Ruby on Rails - UTF8 encoding problems in MySQL from ActiveRecord

mysql,ruby-on-rails,utf-8,character-encoding
I have a webapp using Ruby 1.9 and Rails 4. In my local VM (ubuntu), everything's ok. My DB and tables are using utf8_unicode_ci, and data are well saved into the tables and well printed on webapp pages. My problem is on my production server (EB on AWS). I'm using...

Modelica encoding problems

utf-8,character-encoding,modelica,dymola,openmodelica
Since Modelica 3.2 (released March 2010) it is allowed to use arbitrary Unicode characters in comments, description strings and/or annotations. But for some reason I am having trouble with code like the following: within ; model ENCO_testing "code for investigating Dymola encoding problems" Real TempC "Temperature in °C"; parameter Real...

Python encoding issue through sockets

python,sockets,encoding,utf-8
I am creating a short python script which requires me to send text back and forth between two computers via sockets. Now when I try to test my application locally on the same computer using telnet and it requires me to use the 'utf-8' encoding when I use the bytes()...

Issue with Mac OSX Terminal, shows special character codes like space, brakets

osx,utf-8,terminal
I have recently used "localedef" command to add support for the multiple locales. After doing that I noticed on my terminal for every "space, backslash, forward slash etc" it is showing UTF code like <0200> <002d> etc. I really want to disable this behaviour as it is really difficult to...

Attributes for UTF-8 characters

c++,utf-8,ncurses
Ncurses can display characters with attached attributes via chtypes, which are constructed by or'ing a single character with attributes bitmasks thusly : addch('a' | A_REVERSE); However, after enabling UTF-8 support, pushing a multibyte character to the screen must be done via addstr(char const*), and there is no room for attributes....

How to use ctype_alpha with UTF-8

php,utf-8
How to use ctype_alpha with UTF-8? I have this code: if(empty($_POST) === false) { if (isset($_POST['first_name']) && !empty ($_POST['first_name'])){ if (ctype_alpha($_POST['first_name']) === false) { $errors[] = 'Please enter your First Name with only letters!'; } } } Here I check if everything is fine. If not I get an error....

ANSI vs UTF-8 in web Browser

javascript,html,utf-8,character-encoding,ansi
My requirement is to allow users to use(type) ANSI characters instead of utf-8 when they are typing in to the text fields of my webpages. I looked at the setting of the character set in html meta tag <meta charset="ISO-8859-1"> That was helpful to display the content in ANSI instead...

UTF-8 for URL, Java

java,utf-8
So I'm trying to scrape a grammar website that gives you conjugations of verbs, but I'm having trouble accessing the pages that require accents, such as the page for the verb "fág". Here is my current code: String url = "http://www.teanglann.ie/en/gram/"+ URLEncoder.encode("fág","UTF-8"); System.out.println(url); I've tried this both with and without...

Send Arabic Text to Web Service

excel,vba,excel-vba,utf-8
Okay, I'm confused. My problem is that I want to send the contents of my Excel spreadsheet to a HTTP POST web service, in UTF8 encoding - i.e I want to support Arabic text. I can iterate through the cells of a spreadsheet writing to a stream: Dim fsT 'As...

iOS SQLite SELECT with UTF 8 characters

ios,sqlite,utf-8
I've been reading all questions related to SQlite encoding with no success, so I'll try to ask about my specific case. I have an iOS app with a SQLite database prepopulated. This SQLite has been converted from a MySQL database. Both the SQLite and MySQL databases have UTF8 enconding (the...

AngularJS non-ascii property name support

angularjs,utf-8,non-ascii-chars
I don't know how to use non-ascii property name in AngularJS. I could print a value by using a['property_name'] instead of a.property_name, but I couldn't use the same way in 'orderBy'. If I click on 'name', sorting would happen, but if I click on '가격_price', nothing would happen and an...

json_encode() throwing an error: “Invalid UTF-8 sequence in argument”

php,json,utf-8
<h4>A PHP Error was encountered</h4> <p>Severity: Warning</p> <p>Message: json_encode() [<a href='function.json-encode'>function.json-encode</a>]: Invalid UTF-8 sequence in argument</p> <p>Filename: controllers/share.php</p> <p>Line Number: 130</p> It used to work before, version php 5 [which i believe is the latest major PHP version]....

Extracting Double Byte Characters/substring from a UTF-8 formatted String

java,string,encoding,utf-8
I'm trying to extract emojis and other special Characters from Strings for further processing (e.g. a String contains '😅' as one of its Characters). But neither string.charAt(i) nor string.substring(i, i+1) work for me. The original String is formatted in UTF-8 and this means, that the escaped form of the above...

How to find UTF-8 reference of a composite unicode character

unicode,encoding,utf-8,character-encoding
At work, i have this issue where i need to find the UTF-8 reference of a composite unicode character. The character in question is a "n" with a "^" on top : n̂. This is represented in unicode by the character "n" (U+006E) followed by the circumflex accent (U+0302). What...

Wrong output when str_replace with acute ( ´ ) in utf-8 website [duplicate]

php,html,utf-8
This question already has an answer here: UTF-8 all the way through 14 answers I'm trying to replace an apostroph ( ' ) with acute ( ´ ) from a string after entered into a form and submitted it. <?= str_replace("'","´",$_POST['string']) ?> string for example is: "Jan's Motel" >...

Is the “UTF8” data in my database really encoded correctly?

php,mysql,utf-8
I have a PHP application with a MYSQL database that "should" contain UTF8 encoded data. With regard to unicode characters, my application appears to work properly with beginning to end. If someone submits "Strömgren" into my database (via an HTML form), I see "Strömgren" when I get the data back...

Perl XML::Twig character encoding

xml,perl,encoding,utf-8
I have a set of XML files with a combination of non-simple ASCII characters and encoded characters, e.g.: ... many 8-bit characters such as é, &#10906;, and ñ. (The second character is the ampersand-semicolon version of ⪚. The first and third are the unescaped characters.) The files are in UTF-8...

Haskell: quoteFile fails on text file with “invalid byte sequence” on unicode characters

linux,haskell,unicode,encoding,utf-8
I'm facing issue with quoteFile in my virtual environment (Debian Wheezy with GHC 7.8.4 installed). I have described file oriented version of st quasi quoter from Text.Shakespeare.Text: import Language.Haskell.TH.Quote (QuasiQuoter, quoteFile) import Text.Shakespeare.Text (st) sfFile :: QuasiQuoter stFile = quoteFile st This works very well on my host machine, however,...

SQL*Loader does not recognize delimiter “¥”

oracle,utf-8,delimiter,sql-loader
When trying to load an UTF-8 file with ¥ separator, it doesn't separate the fields when it finds the character. We get an ORA-12899 error from the value exceeding the column size, since it tries to put the entire line into the first column. In UTF-8 , ¥ is Â¥...

Required to convert a String to UTF8 string

c++,c,utf-8,iconv,wchar-t
Problem Statement: I am required to convert a generated string to UTF8 string, this generated string has extended ascii characters and I am on Linux system (2.6.32-358.el6.x86_64). A POC is still in progress so I can only provide small code samples and complete solution can be posted only once ready....

Handle windows-1252 and unicode in java [closed]

java,unicode,utf-8,character-encoding,bytearray
After a http request, I have got a byte array encoded with utf-8, e.g.: byte[] array = new byte[]{0xc3, 0xa4, 0xc2, 0x96} I decode the byte array using new String(array, "UTF-8"). In the example the first decoded char is 0xe4 which represents the letter ä in Unicode – so far...

Remove all non utf-8 characters from file with no output in terminal

ubuntu,utf-8,output
I am a new Ubuntu user dealing with a very large file with a few non-utf8 characters that can be safely skipped. I found another stack overflow question How to remove non UTF-8 characters from text file that gave a way to remove those characters, using the command iconv -f...

How to set charset in mssql_connect?

php,pdo,utf-8,character-encoding
I'm using PHP 5.3 on FreeBSD. I want to select some utf-8 data (persian characters) from SQL Server database, and write them to a text file, but because of utf-8 does all the characters will be written to file ????. I have searched for two days and read most of...

Using UTF-8 identifier

java,android,encoding,utf-8
I get an String Stream form HTTP Request. The Stream looks like: <?xml version="1.0" encoding="utf-8"?> The first three tokens means that the String is encoded to UTF-8. I'm making Files with the String. While reading them i get an error: With this method i'm making Files with that String: private...

BeautifulSoup gives garbage for html conversion

python,html,pdf,utf-8,beautifulsoup
I am trying to scape this url = 'http://www.jmlr.org/proceedings/papers/v36/li14.pdf url. This is my code html = requests.get(url) htmlText = html.text soup = BeautifulSoup(htmlText) print soup #gives garbage However it gives weird symbols that I think is garbage. It's an html file so it shouldn't be trying to parse it as...

PHP - Changing charset for arabic characters using file_get_contents

php,utf-8,file-get-contents,arabic
I am getting an arabic translation using google, this is my code: header('Content-Type: text/html; charset=UTF-8'); $page=file_get_contents("http://www.google.com/translate_t?langpair=en|ar&text=hello",FILE_TEXT); $page=substr($page,strpos($page,"TRANSLATED_TEXT")+strlen("TRANSLATED_TEXT")+2); $page=substr($page,0,strpos($page,"';INPUT_TOOL_PATH")); echo mb_detect_encoding($page); // edited 2015/05/26 echo mb_convert_encoding($page, 'UTF-8', 'ISO-8859-6'); If you follow the link on the file_get_contents function, you will see...

How can I use special characters in angular directives attributes?

javascript,angularjs,utf-8,special-characters,directive
I would like to use strings including german characters (Ä, Ö, Ü) in attributes of a custom angularJS directive. For example: <my-custom-directive my-label="Lärm" /> Another example is the ui.bootstrap.tabs directive: <tabset> <tab heading="Lärm"> content ... </tab> <tab heading="Second Heading"> content ... </tab> </tabset> This results in a tab with heading...

MySQL utf8_czech_ci vs utf8_general_ci

mysql,utf-8
I have an application which is mostly based in czech language, that is why we uft8_czech_ci. Given this example: WHERE `firstName` = 'ales' collate utf8_czech_ci I am unable to find result aleš (which is common czech name). When I try this: WHERE `firstName` = 'ales' collate utf8_general_ci It successfuly finds...

Java unexpected character parsing txt file

java,utf-8
I am trying to divide txt files into ArrayList of strings and so far it works, but first words in the file always starts with (int)'65279' and I can't even copy this character here. Also, in GUI it looks like second letter of word is missing but at the same...

How to print bit representation of unicode character

c++,windows,unicode,utf-8
I try to get binary utf-8 representation of unicode character like on image : but this works only with <128 characters: Here is my code: #include <string> #include <iostream> #include <windows.h> std::string contoutf8(std::wstring str) { int utf8_size = WideCharToMultiByte(CP_UTF8, 0, str.c_str(), str.length(), nullptr, 0, nullptr, nullptr); std::string utf8_str(utf8_size, '\0'); WideCharToMultiByte(CP_UTF8,...

Php mysql query charset

php,mysql,utf-8
I have problem with query charset for mysql bd in php script. I'm placing GET parameter inside select query and it work good for all Latin characters, but with Cyrillic characters it returns me empty table. If I place some value with Cyrillic in query instead of GET parameter, query...

Is the String Constructor from UTF-8 Broken?

java,android,utf-8
I have the following code that loads a null terminated multi-byte string from a buffer. It nominally interprets the data as UTF-8 but, if that conversion fails, it then interprets the data as ISO-8859-1. Here is the code: @Override public String format(String date_format, boolean use_locale, int precision) { String rtn...

export sql in php UTF-8 [closed]

php,sql,database,utf-8,export
I want to export all of tables of my db in sql file using php with encoding UTF-8 whats the best way! whats the code! thank you

Centos System config language not working

linux,utf-8,centos5
I am using Centos 5.x. I have set the system-config-language in /etc/sysconfig/i18n file as LANG="en_US.UTF-8" SYSFONT="latarcyrheb-sun16" But when i type echo $LANG it shows as : fr_FR.UTF_8 Please let me know where can i change the settings so that i will get English language as default. Thanks Rajasekhar...

Loss of quotes when encoding into ascii

python,regex,utf-8,ascii
I wish to extract the text between quotations from news articles. For this purpose the first step involves extracting the new articles. Then in the second step using a regex to get the quotations. I am not sure but the quotations get lost when I encode into ascii. Is there...

Python: difficulty converting ascii to unicode

python,unicode,encoding,utf-8
My goal: get the page source from a url and count all instances of a keyword within that page source How I am doing it: getting the pagesource via urllib2, looping through each char of the page source and comparing it to the keyword My problem: my keyword is encoded...

Can't get UTF-8 Special Chars to Correctly Write to MySQL (PHP)

php,mysql,encoding,utf-8
I am creating a PHP script run from the command line on a typical LAMP stack (L = OS X) and am having a lot of trouble getting special chars to record properly in the database. This script scans a directory recursively and inserts complete path into a MySQL database...

Replace utf8 literals in string

php,utf-8
I am receiving from an external service a string with some utf8 literal in it. $a = $param1; echo $a; \xe7\xe3 How can I convert $a (an utf8 string with 8 characters) to 'çã'? I know I can use strtr with a map of substitutions, but I think that maybe...

The File Encoding Is utf8 but is in Windows-1256 readable

encoding,utf-8
I am working on files with unknown encoding at first but I get the encoding with this lines in JAVA: InputStream in = new FileInputStream(new File("D:\\lbl2\\1 (26).LBL")); InputStreamReader inputStreamReader = new InputStreamReader(in); System.out.print(inputStreamReader.getEncoding()); and we get UTF8 in output. but the problem is that when I try to see file...

Does HTML Encoding have any cons?

asp.net-mvc,razor,encoding,utf-8,xss
I develop a project on ASP.NET MVC framework. All files and charsets are in UTF-8. I'm using model bindings and in some of my models the display property includes some accented chars or single/double quotes. As Razor engine automatically encodes helpers (ie. DisplayNameFor) the accented chars and quotes are encoded....

showing umlauts in html with utf8 charset

html,utf-8
This question is most likely answered many times before, but I have searched some hours now and I still don't understand one basic thing (most probably the utf8-charset itself...). I have a html with german umlauts "ä" and "ö" (&auml; and &ouml;): <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> </head>...

How to understand text language in utf8 encoded text?

node.js,utf-8,character-encoding,redis,language-detection
Redis is using utf8 code and for my project I need to get text language which is utf8 encoded text. Is there any way that can give a clue about the language of the text? EDIT: My project is on NodeJs programming language. In Redis maybe lua script has a...

How to set the filename encoding globally for a python interpreter?

python,encoding,utf-8,filenames
In a project, we need to open many files all over the place in the source code. We know that all filenames on disk are encoded in utf-8 and all filenames are proceed as Unicode internally. Is there a way to set the filename encoding globally for the running interpreter,...

NSJSONSerializzation not reading UTF8 correctly [duplicate]

ios,objective-c,uitableview,utf-8
This question already has an answer here: HTML character decoding in Objective-C / Cocoa Touch 10 answers I'm reading a JSON from an URL. It is UTF8 formatted. When I load the UITableView It shows incorrect characters. Please find attached screenshot at row 2 The code that reads the...

Encode UTF-8 for list

python,python-2.7,encoding,utf-8
I'm using selenium to retrieve a list from a javascript object. search_reply = driver.find_element_by_class_name("ac_results") When trying to write to csv, I get this error: Traceback (most recent call last): File "insref_lookup15.py", line 54, in <module> wr_insref.writerow(instrument_name) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 22: ordinal not in range(128)...

R encoding UTF-8: U+0080-U+009F

r,utf-8,character-encoding
I am struggling with some encoding issues. I have many textfiles that contain rows in the following format: https://dl.dropboxusercontent.com/u/94114397/example.txt According to Notepad++, these are all encoded in UTF-8 and most non-ASCII characters are displayed correctly, as you can see in lines 1 and 2. However, I have problems with some...

Converting Unicode codepoints to UTF-8 in C using iconv

c,unicode,encoding,utf-8,iconv
I want to convert a 32-bit value, which represents a Unicode codepoint, into a sequence of chars which is the utf-8 encoded string containing only the character corresponding to the codepoint. For example, I want to turn the value 955 into the utf-8 encoded string "λ". I tried to do...

Can you skip non utf-8 data in python csv?

python,csv,utf-8
I am dealing with a very large csv file in python where some lines are throwing an error "'utf-8' codec can't decode byte 0x9b in position 7657: invalid start byte". Is there a way to skip lines that aren't utf-8 without going by hand and deleting or fixing data? for...

ctrl+G in erl doesn't work

unicode,encoding,utf-8,erlang,docker
I'm trying to interconnect erlang nodes, but entering ctrl+G doesn not work: Eshell V6.4.1 (abort with ^G) 1> ^G Eshell V6.4.1 (abort with ^G) 1> ^G Eshell V6.4.1 (abort with ^G) 1> ^G Eshell V6.4.1 (abort with ^G) any idea why this can happen? I was thinking about locale settings,...

ZXing 2D barcode decoding: UTF-8 characters not decoded properly

java,utf-8,barcode,zxing
I'm trying to use ZXing to read 2D barcodes and it mostly works fine, except it doesn't really recognize some UTF-8 characters like č and ć. I'm using this code to set the encoding: MultiFormatReader reader = new MultiFormatReader(); Hashtable hints = new Hashtable(); hints.put(EncodeHintType.CHARACTER_SET, "UTF-8"); reader.setHints(hints); result = reader.decode(bitmap);...

How to use UTF-8 in C code?

c,utf-8
My setup: gcc-4.9.2, UTF-8 environment. The following C-program works in ASCII, but does not in UTF-8. Create input file: echo -n 'привет мир' > /tmp/вход This is test.c: #include <stdio.h> #include <stdlib.h> #include <string.h> #define SIZE 10 int main(void) { char buf[SIZE+1]; char *pat = "привет мир"; char str[SIZE+2]; FILE...

Java 8 change in UTF-8 decoding

java,utf-8,java-8,regression
We recently migrated our application to JDK 8 from JDK 7. After the change, we ran into a problem with the following snippet of code. String output = new String(byteArray, "UTF-8"); The byte array may contain invalid UTF-8 byte sequences. The same byte array upon UTF-8 decoding, results in two...

Turkish characters are not shown on TextView

android,utf-8,textview
Turkish characters are not shown on TextView. I have read previous questions and did some changes and they are not solved my problem. Here is picture: Here is my changed code: holder.txtGazeteName.setText(Html.fromHtml(gazetelerArrayList.get(position).getName()).toString()); here is input for ArrayList gazete = new GazetelerClass(); gazete.setName("YeniŞafak"); gazete.setAdress("http://www.yenisafak.com.tr/yazarlar/"); gazete.setImage(R.drawable.yenisafak); gazetelerArrayList.add(gazete); ...

Encoding problems in Python - 'ascii' codec can't encode character '\xe3' when using UTF-8

python,encoding,utf-8
I've created a program to print out some html content. My source file is in utf-8, the server's terminal is in utf-8, and I also use: out = out.encode('utf8') to make sure, the character chain is in utf8. Despite all that, when I use some characters like "ã", "é" in...

Writing byte array to an UTF8-encoded file

java,utf-8,java-io
Given a byte array in UTF-8 encoding (as result of base64 decoding of a String) - what is please a correct way to write it to a file in UTF-8 encoding? Is the following source code (writing the array byte by byte) correct? OutputStreamWriter osw = new OutputStreamWriter( new FileOutputStream(tmpFile),...

How to remove last utf8 char of a python string

python,python-2.7,utf-8
I have a string containing utf-8 encoded text. I need to remove the last utf-8 character. So far I did msg = msg[:-1] but this only removes the last byte. It works as long as the last character is an ASCII code. It doesn't work anymore when the last character...

How to remove any utf8mb4 characters in string

c#,.net,utf-8,utf8mb4
Using C# how can utf8mb4 characters (emoji, etc.) be removed from a string, so that the result is full utf8 compliant. Most of the solutions involve changing the database configuration, but unfortunately I don't have that possibility....

How to make sure a XDocument is saved with utf-8 file encoding?

c#,xml,unicode,encoding,utf-8
I am creating a Xml file with the following code (the byte array returned by Serialize() is written to a FileStream later): public byte[] Serialize() { using (var stream = new MemoryStream()) { WriteXmlToStream(stream); stream.Position = 0; using (var reader = new StreamReader(stream)) { string resultString = reader.ReadToEnd(); return Encoding.UTF8.GetBytes(resultString);...

Replace special qoutes with normal

vb.net,replace,utf-8
In VB.NET how do I replace special opening and closing double quotes (“ and ”) with ASCII quotes ("). Ive tried s = s.replace("“", """") but it seems that Visual Studio consider the “ quote in my code to be a normal quote leaving me with an invalid statement....

Why does Python 3 output \xe3, an extra char?

python,python-3.x,unicode,utf-8
Why does Python add \xe3 in the output of: >>> b'Transa\xc3\xa7\xc3\xa3o'.decode('utf-8') 'Transaç\xe3o' Expected value is: 'Transação' Some more information about my environment: >>> import sys >>> print (sys.version) 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)] >>> sys.stdout.encoding 'cp437' This was under Console 2 + Powershell....

What happens under the hood when bytes converted to String in Java?

java,string,unicode,utf-8,byte
I have a problem when trying to convert bytes to String in Java, with code like: byte[] bytes = {1, 2, -3}; byte[] transferred = new String(bytes, Charsets.UTF_8).getBytes(Charsets.UTF_8); and the original bytes are not the same as the transferred bytes, which are respectively [1, 2, -3] [1, 2, -17, -65,...

Encoding error in Rails 4 when querying mysql DB

mysql,ruby,ruby-on-rails-4,encoding,utf-8
I am receiving this error under certain situations when querying my db which is on an aws server (mysql). I have the db configured to utf8 and have the rails app set up to utf-8 as well in the config/application.rb file. Any suggestions? AwsCourse Load (36.5ms) SELECT COURSE.* FROM COURSE...

Force UTF-8 encoding in inline CSS

css,encoding,utf-8
I have inline CSS loading within an element which renders a close (X) icon on a popup overlay. When rendered on a page which is not in UTF-8 it is rendered badly with local characters instead. Code is as follows: .close{ position:absolute; top:-14px; right:-13px; cursor:pointer; color: #fff; border: 1px solid...

How to convert euro (€) symbol from Windows-1252 to UTF-8?

php,encoding,utf-8,windows-1252
A software generates me a Windows-1252 XML file, and I would like to parse it in PHP, and send the data on my database in UTF8. I tried a lot of solutions, such as iconv or utf8_encode functions, but no result. It displays things like €, but not just €......

Why are my pictures corrupted after downloading and writing them in python?

python,facebook,utf-8
Preface This is my first post on stackoverflow so I apologize if I mess up somewhere. I searched the internet and stackoverflow heavily for a solution to my issues but I couldn't find anything. Situation What I am working on is creating a digital photo frame with my raspberry pi...

Working with characters based on their UTF-8 hex codes

javascript,jquery,unicode,utf-8
I'm working on something that will read a user's text messages and export them to a csv file, which they can then download. The messages are being retrieved from a third-party web interface—I am essentially using js to grab the html of each message and compiling it as needed. The...