Is dataframe just a table?
Is Dataframe just a table?
Full article can be found here
Proposal
The author tries to explain the reasons why the design of dataframes and tables are different.
Details
- database folks think in terms of tables (aka relations)
- datascience folks think in terms of dataframes (read pandas)
- former uses lists, whereas the latter uses sets
- author argues that this can have impact on how the implementations will differ between these cases use-cases!
- because lists are dependent on the order whereas sets aren't
- matrices vs tables
- matrices are good at aggregation kind of operations
- however, by storing variables also as columns, such a "matrix" could be converted into a relation, thereby easing this process too!
- however, tables seem to outshine dataframes when it comes to joins
- closest operation in dataframes is a merge
- author argues that merge and concat ops in dataframes are not actually the same as as in join and union ops in tables and these dataframes ops are needlessly complicated!
- dataframes are procedural interfaces while relations offer more declarative style
- sql ops, for eg, only support first-order logic (looping across items is 2nd order)
- code reuse is better with dataframes than with tables
- debugging is also better with dataframes
- tables have query-optimizers which can lead to better optimized operations internally
- in dataframes, users have to manually tune for perf
- dataframes offer many ways to solve one thing, which can be confusing to programmers