Skip to content

Commit 9987afe

Browse files
committed
Use UTF-8 path encoding by default on DragonFly BSD, NetBSD and Solaris.
On DragonFly BSD 6.4.0, std::locale("") fails unless LANG is set to some locale that is supported in libc. On Solaris 11.4, std::locale("") fails even if LANG is set correctly in the environment. Recent versions of Solaris seem to have transitioned to UTF-8 for filename encoding. All BSD systems seem to have come to UTF-8 for path encoding by default, so use utf8_codecvt_facet on all of them, plus Solaris. Removed duplication of preprocessor checks for whether to use utf8_codecvt_facet.
1 parent 2013d90 commit 9987afe

2 files changed

Lines changed: 35 additions & 22 deletions

File tree

doc/release_history.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ <h2>1.91.0</h2>
5353
<li>Error codes reported by Boost.Filesystem will now use POSIX <code>errno</code> values.</li>
5454
</ul>
5555
</li>
56+
<li>On DragonFly BSD, NetBSD and Solaris, Boost.Filesystem default path locale now uses UTF-8 for path character encoding.</li>
5657
</ul>
5758

5859
<h2>1.90.0</h2>

src/path.cpp

Lines changed: 34 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
// filesystem path.cpp ------------------------------------------------------------- //
22

33
// Copyright Beman Dawes 2008
4-
// Copyright Andrey Semashev 2021-2024
4+
// Copyright Andrey Semashev 2021-2025
55

66
// Distributed under the Boost Software License, Version 1.0.
77
// See http://www.boost.org/LICENSE_1_0.txt
@@ -28,8 +28,39 @@
2828
#include "windows_file_codecvt.hpp"
2929
#include "windows_tools.hpp"
3030
#include <windows.h>
31-
#elif defined(macintosh) || defined(__APPLE__) || defined(__APPLE_CC__) || defined(__FreeBSD__) || defined(__OpenBSD__) || defined(__HAIKU__)
31+
#elif defined(macintosh) || defined(__APPLE__) || defined(__APPLE_CC__) || \
32+
defined(__FreeBSD__) || defined(__OpenBSD__) || defined(__DragonFly__) || defined(__NETBSD__) || defined(__NetBSD__) || \
33+
defined(sun) || defined(__sun) || \
34+
defined(__HAIKU__)
35+
// "All BSD system functions expect their string parameters to be in UTF-8 encoding
36+
// and nothing else." See
37+
// http://developer.apple.com/mac/library/documentation/MacOSX/Conceptual/BPInternational/Articles/FileEncodings.html
38+
//
39+
// "The kernel will reject any filename that is not a valid UTF-8 string, and it will
40+
// even be normalized (to Unicode NFD) before stored on disk, at least when using HFS.
41+
// The right way to deal with it would be to always convert the filename to UTF-8
42+
// before trying to open/create a file." See
43+
// http://lists.apple.com/archives/unix-porting/2007/Sep/msg00023.html
44+
//
45+
// "How a file name looks at the API level depends on the API. Current Carbon APIs
46+
// handle file names as an array of UTF-16 characters; POSIX ones handle them as an
47+
// array of UTF-8, which is why UTF-8 works well in Terminal. How it's stored on disk
48+
// depends on the disk format; HFS+ uses UTF-16, but that's not important in most
49+
// cases." See
50+
// http://lists.apple.com/archives/applescript-users/2002/Sep/msg00319.html
51+
//
52+
// Many thanks to Peter Dimov for digging out the above references!
53+
//
54+
// BSD systems have historically been largely encoding-agnostic wrt. filesystem paths,
55+
// but more recent versions have come to universally use UTF-8.
56+
//
57+
// On DragonFly BSD 6.4.0, std::locale("") fails unless LANG is set to some locale that
58+
// is supported in libc.
59+
//
60+
// On Solaris 11.4, std::locale("") fails even if LANG is set correctly in the environment.
61+
// Recent versions of Solaris seem to have transitioned to UTF-8 for filename encoding.
3262
#include <boost/filesystem/detail/utf8_codecvt_facet.hpp>
63+
#define BOOST_FILESYSTEM_DETAIL_USE_UTF8_CODECVT_FACET
3364
#endif
3465

3566
#ifdef BOOST_FILESYSTEM_DEBUG
@@ -1427,26 +1458,7 @@ std::locale default_locale()
14271458
#if defined(BOOST_FILESYSTEM_WINDOWS_API)
14281459
std::locale global_loc = std::locale();
14291460
return std::locale(global_loc, new boost::filesystem::detail::windows_file_codecvt());
1430-
#elif defined(macintosh) || defined(__APPLE__) || defined(__APPLE_CC__) || defined(__FreeBSD__) || defined(__OpenBSD__) || defined(__HAIKU__)
1431-
// "All BSD system functions expect their string parameters to be in UTF-8 encoding
1432-
// and nothing else." See
1433-
// http://developer.apple.com/mac/library/documentation/MacOSX/Conceptual/BPInternational/Articles/FileEncodings.html
1434-
//
1435-
// "The kernel will reject any filename that is not a valid UTF-8 string, and it will
1436-
// even be normalized (to Unicode NFD) before stored on disk, at least when using HFS.
1437-
// The right way to deal with it would be to always convert the filename to UTF-8
1438-
// before trying to open/create a file." See
1439-
// http://lists.apple.com/archives/unix-porting/2007/Sep/msg00023.html
1440-
//
1441-
// "How a file name looks at the API level depends on the API. Current Carbon APIs
1442-
// handle file names as an array of UTF-16 characters; POSIX ones handle them as an
1443-
// array of UTF-8, which is why UTF-8 works well in Terminal. How it's stored on disk
1444-
// depends on the disk format; HFS+ uses UTF-16, but that's not important in most
1445-
// cases." See
1446-
// http://lists.apple.com/archives/applescript-users/2002/Sep/msg00319.html
1447-
//
1448-
// Many thanks to Peter Dimov for digging out the above references!
1449-
1461+
#elif defined(BOOST_FILESYSTEM_DETAIL_USE_UTF8_CODECVT_FACET)
14501462
std::locale global_loc = std::locale();
14511463
return std::locale(global_loc, new boost::filesystem::detail::utf8_codecvt_facet());
14521464
#else // Other POSIX

0 commit comments

Comments
 (0)