Skip to content

ahrink/uniprint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

uniprint

Adds uniprint, a small C99 utility that converts Unicode code points (e.g. 0041, U+1F512) into UTF-8 bytes. Argument-only, no trailing newline, designed for safe composition in POSIX sh. Includes strict Unicode scalar validation and UTF-8 encoding core.

uniprint

uniprint is a small C99 utility that converts a Unicode code point into its UTF-8 byte sequence and writes it to standard output.

The program is argument-only (no stdin piping) and produces no trailing newline by default, making it suitable for safe composition in POSIX sh scripts and shell functions.


Synopsis

uniprint CODEPOINT
uniprint --help

Description

Shells are inconsistent in their handling of escape sequences and Unicode
output. uniprint centralizes Unicode parsing and UTF-8 encoding in C, where
byte-level behavior is explicit and portable.

The tool accepts a textual representation of a Unicode code point, validates
it as a Unicode scalar value, encodes it as UTF-8, and writes the resulting
bytes to stdout.

Code Point Format

The CODEPOINT argument may be specified in any of the following forms:

U+XXXX

XXXX

0xXXXX

Hexadecimal is assumed for:

U+ prefixed values

0x prefixed values

plain hexadecimal tokens of length 4–6

Valid Unicode scalar values are:

U+0000 to U+10FFFF

excluding the surrogate range U+D800–U+DFFF

Examples
uniprint 0041

uniprint U+1F512


POSIX sh usage:

sym=$(uniprint 0041)
printf '%s\n' "$sym"


Shell function wrapper:

uniFN() {
    uni=$1
    art=$(uniprint "$uni")
    printf '%s' "$art"
}

Exit Status

0 success

>0 invalid input or encoding error

Error messages are written to standard error.

Notes

uniprint does not interpret shell escape sequences.

For emitting raw UTF-8 byte sequences from scripts, tools such as awk
may be used.

UTF-8 encoding and Unicode validation are implemented in ahr_utf8.
C/
├── Makefile
├── includes/
│   └── ahr_utf8.h
└── src/
    ├── ahr_utf8.c
    └── uniprint.c

About

Adds uniprint, a small C99 utility that converts Unicode code points (e.g. 0041, U+1F512) into UTF-8 bytes. Argument-only, no trailing newline, designed for safe composition in POSIX sh. Includes strict Unicode scalar validation and UTF-8 encoding core.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors