Skip to content

Add DataFrame.renderToMarkdown(): String function#1760

Open
koperagen wants to merge 13 commits intomasterfrom
markdown-rendering
Open

Add DataFrame.renderToMarkdown(): String function#1760
koperagen wants to merge 13 commits intomasterfrom
markdown-rendering

Conversation

@koperagen
Copy link
Copy Markdown
Collaborator

fixes #525
First, i added more tests on original renderToString. Then a little refactoring to extract common logic. +new function

Original rendering has some quirks that can be seen in test asserts. For now i just preserved it as is, but probably let's improve it as well - it will be more obvious with new tests

@koperagen koperagen requested a review from Jolanrensen March 18, 2026 18:13
@koperagen koperagen self-assigned this Mar 18, 2026
@koperagen koperagen added the enhancement New feature or request label Mar 18, 2026
@Jolanrensen
Copy link
Copy Markdown
Collaborator

Original rendering has some quirks that can be seen in test asserts

What kind of quirks do you mean?

val expected =
"""
| | name | age | city |
|---:|---:|---:|---:|
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do these colons do? I don't think I've seen them before in MD tables

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! it's alignment, right? Uhm, maybe we should have "no alignment" as an option, as well as right/left alignment. There are some right-to-left languages that could cause issues with a default alignment. (Actually, they probably already break tables like these, they broke our toString() very much too)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we still have a "no-alignment" option?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! it's still in progress, i'll update once more after review

val decimalFormat =
if (precision >= 0) RendererDecimalFormat.fromPrecision(precision) else RendererDecimalFormat.of("%e")
top.values().map {
escapeValue(renderValueForStdout(it, valueLimit, decimalFormat = decimalFormat).truncatedContent)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe create a new issue for exploring rendering nested frames as <details> like toStaticHtml? or at least mention this option in some kdoc :) Speaking of... small kdoc please? :))

@koperagen
Copy link
Copy Markdown
Collaborator Author

koperagen commented Mar 30, 2026

@Jolanrensen please have a look when you have time. I checked default print output on a 7500 rows x 77 columns "real world data" output and from there made a bunch of changes. One important is always printing column types and on a separate line, which i think makes a lot of sense for String API ease of use / dataschema declarations.
Went to compare with pandas outputs and stumbled upon this https://pypi.org/project/tabulate/. Thought we could have a few styles as well

@Jolanrensen
Copy link
Copy Markdown
Collaborator

Wow awesome!

One important is always printing column types and on a separate line, which i think makes a lot of sense for String API ease of use / dataschema declarations.

I agree, especially with large columnGroup/frameColumn types!

Small concern about this approach though with columnTypes = true, borders = false, rowIndex = false as it becomes quite hard to differentiate between the name/type/values.

     name  age    city
   String  Int  String
    Alice   30  Berlin
      Bob   25   Paris
  Charlie   35  London

I wonder how other libraries solve this. Maybe we could mark our types somehow too? Something like

     name   age     city
  `String` `Int` `String`
    Alice    30   Berlin
      Bob    25    Paris
  Charlie    35   London

or

     name   age     city
  String<  Int<  String<
    Alice    30   Berlin
      Bob    25    Paris
  Charlie    35   London

Okay, my examples are quite ugly, but you get the idea

@koperagen
Copy link
Copy Markdown
Collaborator Author

koperagen commented Mar 31, 2026

Wow awesome!

One important is always printing column types and on a separate line, which i think makes a lot of sense for String API ease of use / dataschema declarations.

I agree, especially with large columnGroup/frameColumn types!

Small concern about this approach though with columnTypes = true, borders = false, rowIndex = false as it becomes quite hard to differentiate between the name/type/values.

     name  age    city
   String  Int  String
    Alice   30  Berlin
      Bob   25   Paris
  Charlie   35  London

I wonder how other libraries solve this. Maybe we could mark our types somehow too? Something like

     name   age     city
  `String` `Int` `String`
    Alice    30   Berlin
      Bob    25    Paris
  Charlie    35   London

or

     name   age     city
  String<  Int<  String<
    Alice    30   Berlin
      Bob    25    Paris
  Charlie    35   London

Okay, my examples are quite ugly, but you get the idea

Without index = true it's indeed a bit confusing
I was looking at R https://dplyr.tidyverse.org/.

image pandas simply doesn't render types in print, looks like they instead provide "info" akin to our "describe". Maybe backticks are good option for us. Should we wrap always or only when index = false?

): String {
val sb = StringBuilder()
val table = prepareTable(rowsLimit, valueLimit, columnTypes, rowIndex)
val columnLengths = table.values.mapIndexed { col, vals ->
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait... so this is the... width... of each column in number of characters?

rowIndex: Boolean = true,
): String = renderToString(rowsLimit, valueLimit, borders, alignLeft, columnTypes, title, rowIndex)

public interface StringBorderRenderingStyle {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could drop the "Rendering" part. "StringBorderStyle" is already clear enough (probably "BorderStyle" too if we think it doesn't clutter the rest of the library too much)

@Jolanrensen
Copy link
Copy Markdown
Collaborator

Jolanrensen commented Mar 31, 2026

     name    age      city
 <String>  <Int>  <String>
    Alice     30    Berlin
      Bob     25     Paris
  Charlie     35    London

Oh the R brackets may also work :) I think I like them better (probably because I'm used to seeing types inside <> in Kotlin too :D)

compared to:

     name    age      city
 `String`  `Int`  `String`
    Alice     30    Berlin
      Bob     25     Paris
  Charlie     35    London

As for when we should wrap them, I think we should just always do it. We had : before to signify we were displaying a type instead of a column name. So I think it would help to have it always:

???????????????????????????????????
?   ?      name?    age?      city?
?   ?  <String>?  <Int>?  <String>?
???????????????????????????????????
?  0?     Alice?     30?    Berlin?
?  1?       Bob?     25?     Paris?
?  2?   Charlie?     35?    London?
???????????????????????????????????
         name    age      city
     <String>  <Int>  <String>
  0     Alice     30    Berlin
  1       Bob     25     Paris
  2   Charlie     35    London

May be a bit overboard for frameColumns, but it is still quite clear we're talking about types:

            a                                     group
     <Double>  <[a:Double, group:{l:Double, r:Double}]>
  0       0.0  [1 x 2] { a:0.000000, group:{ l:0.000...

@Jolanrensen
Copy link
Copy Markdown
Collaborator

Jolanrensen commented Mar 31, 2026

btw, what about markdown? can we / should we also display the type on a new line? Does md even support mutliple-lines in headers?

Edit:

name
<String>
age
<Int>
city
<String>
0 Alice 30 Berlin
1 Bob 25 Paris
2 Charlie 35 London
 |  | name<br>\<String\> | age<br>\<Int\> | city<br>\<String\> |
 |---:|---:|---:|---:|
 | 0 | Alice | 30 | Berlin |
 | 1 | Bob | 25 | Paris |
 | 2 | Charlie | 35 | London |

@koperagen
Copy link
Copy Markdown
Collaborator Author

koperagen commented Apr 2, 2026

I'm not so in favor of mandatory brackets :(

As for newlines in markdown, i didn't think about <br>, interesting it works! Tbh this all is confusing me. When i started i didn't know markdown is a superset of HTML. If this is the case, then why we even need this table renderer when we already have toStaticHtml? :)) But at the same time, this table format is not even "standard" markdown
Maybe it's worth to have it be plain ASCII for max compatibility and look again at toStaticHtml for advanced features? It's already more versatile

@Jolanrensen
Copy link
Copy Markdown
Collaborator

Jolanrensen commented Apr 2, 2026

I'm not so in favor of mandatory brackets :(

It does provide a clearer picture, no? :) otherwise, we can always add a setting which turns it off. I liked them in the R table :)

As for Markdown... hmm, I think we could still offer it. It's like a hybrid between ascii and html, like, it's readable, but also renderable. I think there's some value in that. But then we should probably keep it as flat and simple as possible, because, as you say, for more advanced cases, we can just turn to HTML.
Maybe we could align it a bit with markdown output from interactive tables?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add feature to generate Markdown table

2 participants