Skip to content

Row-based DataFrame builder/collector DSL #1541

@Jolanrensen

Description

@Jolanrensen

We could consider adding a DSL similar to buildList {} but buildDataFrame {} where rows can be yielded one at a time.

This will be more performant than:

var df = DataFrame.EMPTY
for (row in data) {
    df = df.append(row....)
}

It could use something like our ColumnDataCollector under-the-hood but one data collector for each column the user wants to be in the dataframe.

Ideas for the DSL:

  1. append-like
val df = buildDataFrame("a", "b", "c") {
    for (row in data) {
        if (row.isSomething) continue

        append(row.myA, row.myB, row.myC)
    }
}

or

val df = buildDataFrame {
    for (row in data) {
        if (row.isSomething) continue

        append("a" to row.myA, "b" to row.myB, "c" to row.myC)
    }
}
  1. map-like
val df = buildDataFrame {
    for (row in data) {
        if (row.isSomething) continue

        add("a", row.myA) // or put?
        this["b"] = row.myB
        put("c", row.myC)  // or add?
    }
}
  1. toDataFrame/add-like
val df = buildDataFrame {
    for (row in data) {
        if (row.isSomething) continue

        "a" from { row.myA }
        "b" from row.myB // {} are not needed here because we have a single row
        row.myC into "c"
    }
}

(This should not be confused with the column-based DynamicDataFrameBuilder)

Obligatory mention: We can already do something similar like this is the latest -dev version, though it creates n DataFrames under the hood:

val df = buildList {
    for (row in data) {
        if (row.isSomething) continue
        this += mapOf("a" to row.myA, "b" to row.myB, "c" to row.myC).toDataRow()
    }
}.concat()

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions