Skip to content

Terminal should force pseudoconsole host into UTF-8 codepage by default #1802

Open
@DHowett-MSFT

Description

@DHowett-MSFT

It's 2019, after all. Maybe we should introduce a flag that starts up the pseudoconsole host in codepage 65001 so that we make good on our promise of "emoji just work and everything else works like it should too," and use WT as a real opportunity to push the boundaries here.

  🌛
💪 💪
  👖

maintainer note, Aug 2023

It's {{current_year}}, after all

Also, we want to take into account arbitrary codepages, ala #15678

Activity

added
Product-ConptyFor console issues specifically related to conpty
Area-TerminalConnectionIssues pertaining to the terminal<->backend connection interface
Issue-TaskIt's a feature request, but it doesn't really need a major design.
on Jul 3, 2019
added this to the Terminal v1.0 milestone on Jul 3, 2019
ghost added
Needs-TriageIt's a new issue that the core contributor team needs to triage at the next triage meeting
on Jul 3, 2019
miniksa

miniksa commented on Jul 3, 2019

@miniksa
Member

I_am_okay_with_this.jpg

MicheleCicciottiWork

MicheleCicciottiWork commented on Jul 4, 2019

@MicheleCicciottiWork

Does cmd support batch scripts in codepage 65001 now?

hcoona

hcoona commented on Jul 5, 2019

@hcoona

+1 for running *nix tools with CJK outputs

For example: fc-list in texlive

rivy

rivy commented on Jul 6, 2019

@rivy

A work-around in the meanwhile is ... enable BETA: Use Unicode UTF-8 for worldwide language support in the "Control Panel \ Clock and Region \ Region \ Administrative \ Change system locale..." dialog box and rebooting. Do note the BETA prefix.

zadjii-msft

zadjii-msft commented on Jul 8, 2019

@zadjii-msft
Member

I'm on board with this, esp. if we add a "disableAutoCp65001" (boy that needs a better name) setting to disable this behavior, set to false by default.

removed
Needs-TriageIt's a new issue that the core contributor team needs to triage at the next triage meeting
on Jul 8, 2019
driver1998

driver1998 commented on Sep 26, 2019

@driver1998

Maybe we need to somehow enable localized messages in CMD while codepage is 65001.
Now it is forced to be English.

driver1998

driver1998 commented on Oct 4, 2019

@driver1998

A work-around in the meanwhile is ... enable BETA: Use Unicode UTF-8 for worldwide language support in the "Control Panel \ Clock and Region \ Region \ Administrative \ Change system locale..." dialog box and rebooting. Do note the BETA prefix.

This changes the system code page to 65001, if you have any ANSI application, they will be forced to use UTF-8.
It should be fine for English users, but for CJK users, that will be a big trouble.

31 remaining items

modified the milestones: Backlog, 22H2 on Jan 4, 2022
o-sdn-o

o-sdn-o commented on May 20, 2023

@o-sdn-o

Since std input works well with UTF-8 encoding today #14745, may be it's worth to add some sort of a syntactic sugar to push/pop the initial state of the system code page for new Windows console applications? Wrap it up in a single API call or put it into a specific header file (e.g. <iostream>) to reduce following boilerplate code at the beginning

#define UTF8_EVERYWHERE
#include <iostream>
#include <string>

// Put this block inside <iostream> on windows?
#ifdef _WIN32
    #ifdef UTF8_EVERYWHERE
        #include "windows.h"
        namespace winapi_cp_state
        {
            static UINT ou_state = GetConsoleOutputCP(); // Save original system code pages.
            static UINT in_state = GetConsoleCP();       //
            static void set_page(UINT out, UINT in) { SetConsoleOutputCP(out); SetConsoleCP(in); }
            static void set_page() { set_page(ou_state, in_state); }
            static int _state = (set_page(CP_UTF8, CP_UTF8), ::atexit(set_page)); // Set to UTF-8 and always restore original system code pages at exit.
        }
    #endif
#endif

// x-platform code
int main()
{
    std::cout << "Test: あああ🙂🙂🙂日本👌中文👍Кириллица\n"; // Make sure you save your project file with 65001(UTF-8) encoding.
    std::cout << "Enter text: ";
    std::string utf8;
    std::cin >> utf8;
    std::cout << "UTF-8 text: " << utf8 << std::endl;
    return 0;
}

The #define UTF8_EVERYWHERE key is used to indicate the programmer’s intention to use UTF-8 encoding instead of original system code page.
This would be extremely helpful for newbie console programmers. All of them are completely confused with text encodings. In addition, cross-platform compatibility is achieved automatically.

modified the milestones: 22H2, Backlog on Jul 5, 2023
zadjii-msft

zadjii-msft commented on Jul 5, 2023

@zadjii-msft
Member

xref some discussion in #15504

pretty sure our plan was to do:

  • compatibility.defaultToutf8 in the Terminal settings, which sets
  • a --flag to conpty to tell it to default to CP65001 instead.

Now it's just a matter of plumbing, and deciding if we really want to do the --flags thing or the --defaultToUtf8 thing.


More notes

conhost --codepage 65001 to start in utf-8, or accept an arbitrary one.
conhost --codepage WITHOUT AN ARG to use the one in HKCU/Console/Codepage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Area-TerminalConnectionIssues pertaining to the terminal<->backend connection interfaceIssue-TaskIt's a feature request, but it doesn't really need a major design.Priority-3A description (P3)Product-ConptyFor console issues specifically related to conptyProduct-TerminalThe new Windows Terminal.

Type

No type

Projects

No projects

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @rivy@DHowett@methane@hcoona@eryksun

      Issue actions

        Terminal should force pseudoconsole host into UTF-8 codepage by default · Issue #1802 · microsoft/terminal