Skip to content

Conversation

@Allex-Nik
Copy link
Collaborator

@Allex-Nik Allex-Nik commented Oct 29, 2025

Fixes #1496

"age" to columnOf(15, 20, 25),
"group" to columnOf(1, 1, 2),
)
val age = df["age"].cast<Int>()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the best way to do that without using the compiler plugin? I also tried: val age by column<Int>(), and then write df[age] but it is marked as deprecated

Copy link
Collaborator

@Jolanrensen Jolanrensen Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is the best for now. We deprecated the column accessor api (so val myCol by column(); df[myCol]) in favor of the compiler plugin, but we still have the String API we can use if the compiler plugin cannot be used, like here :)

You're using the String API correctly. The only alternative I can think of is df.get { "age"<Int>() } but it achieves exactly the same thing.

fun `count on dataframe`() {
df.count() shouldBe 3
df.count { age > 18 } shouldBe 2
df.count { it["name"] == "Alice" } shouldBe 1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can I do df.count { age > 18 } but cannot do df.count { name.startsWith("A") }? age and name are obtained in the same way, and we made sure that they have the types that are appropriate for the called methods.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're calling CountTests.age inside count {}, not actually age from df :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use df.count { column<Int>("age") > 18 } to access a column from df by String name and type within this DSL. Or the shortcut:
df.count { "age"<Int>() > 18 }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do the same with "name" and then you can call startsWith :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank you!


@Test
fun `count on empty grouped dataframe`() {
emptyDf.groupBy("group").count().count() shouldBe 0
Copy link
Collaborator Author

@Allex-Nik Allex-Nik Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do emptyDf.groupBy("group").count(), it returns a dataframe without the column count. Is it expected?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! We should at least get an empty column named count. This causes runtime exceptions with the compiler plugin, but I think it's an issue deep inside aggregation itself... I'll make an issue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! #1531

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add unit tests for count function

3 participants