Questions tagged [dataframe]

A data frame is a tabular data structure. Usually, it contains data where rows are observations and columns are variables of various types. While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

A data frame is a tabular data structure. Usually, it contains data where rows are observations and columns are variables of various types. While data frame or dataframe is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), table is the term used in MATLAB and SQL.

The sections below correspond to each language that uses this term and are aimed at the level of an audience only familiar with the given language.

`data.frame` in R

Data frames (object class data.frame) are one of the basic tabular data structures in the R language, alongside matrices. Unlike matrices, each column can be a different data type. In terms of implementation, a data frame is a list of equal-length column vectors.

Type ?data.frame for help constructing a data frame. An example:

data.frame(
  x = letters[1:5], 
  y = 1:5, 
  z = (1:5) > 3
)
#   x y     z
# 1 a 1 FALSE
# 2 b 2 FALSE
# 3 c 3 FALSE
# 4 d 4  TRUE
# 5 e 5  TRUE

Related functions include is.data.frame, which tests whether an object is a data.frame; and as.data.frame, which coerces many other data structures to data.frame (through S3 dispatch, see ?S3). base r data.frames have been extended or modified to create new data structures by several R packages, including data.table and tibble. For further reading, see the paragraph on Data frames in the CRAN manual Intro to R

DataFrame in Python's pandas library

The pandas library in Python is the canonical tabular data framework on the SciPy stack, and the DataFrame is its two-dimensional data object. It is basically a rectangular array like a 2D numpy ndarray, but with associated indices on each axis which can be used for alignment. As in R, from an implementation perspective, columns are somewhat prioritized over rows: the DataFrame resembles a dictionary with column names as keys and Series (pandas' one-dimensional data structure) as values.

After importing numpy and pandas under the usual aliases (import numpy as np, import pandas as pd), we can construct a DataFrame in several ways, such as passing a dictionary of column names and values:

>>> pd.DataFrame({"x": list("abcde"), "y": range(1,6), "z": np.arange(1,6) > 3})
   x  y      z
0  a  1  False
1  b  2  False
2  c  3  False
3  d  4   True
4  e  5   True

DataFrame in Apache Spark

A Spark DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs. (source)

DataFrame in Maple

A DataFrame is one of the basic data structures in Maple. Data frames are a list of variables, known as DataSeries, which are displayed in a rectangular grid. Every column (variable) in a DataFrame has the same length, however, each variable can have a different type, such as integer, float, string, name, boolean, etc.

When printed, Data frames resemble matrices in that they are viewed as a rectangular grid, but a key difference is that the first row corresponds to the column (variable) names, and the first column corresponds to the row (individual) names. These row and columns are treated as header meta-information and are not a part of the data. Moreover, the data stored in a DataFrame can be accessed using these header names, as well as by the standard numbered index. For more details, see the Guide to DataFrames in the online Maple Programming Help.

349 questions

196

votes

2 answers

Difference between isna() and isnull() in pandas

I have been using pandas for quite some time. But, I don't understand what's the difference between isna() and isnull(). And, more importantly, which one to use when identifying missing values in a dataframe. What is the basic underlying difference…

python pandas dataframe

asked Sep 06 '18 at 10:14

Vaibhav Thakur

2,403
3
13
9

votes

9 answers

How do I compare columns in different data frames?

I would like to compare one column of a df with other df's. The columns are names and last names. I'd like to check if a person in one data frame is in another one.

pandas dataframe

asked Jun 12 '18 at 22:34

a_a_a

votes

3 answers

How to sum values grouped by two columns in pandas

I have a Pandas DataFrame like this: df = pd.DataFrame({ 'Date': ['2017-1-1', '2017-1-1', '2017-1-2', '2017-1-2', '2017-1-3'], 'Groups': ['one', 'one', 'one', 'two', 'two'], 'data': range(1, 6)}) Date Groups data 0 …

python pandas dataframe

asked Jul 10 '17 at 15:47

Kevin

votes

2 answers

How to plot two columns of single DataFrame on Y axis

I have two data frames (Action, Comedy). Action contains two columns (year, rating) ratings columns contains average rating with respect to year. The Comedy data frame contains the same two columns with different mean values. I merged both data…

python pandas visualization dataframe

asked Dec 12 '17 at 13:04

Bilal Butt

votes

4 answers

One hot encoding alternatives for large categorical values

I have a data frame with large categorical values over 1600 categories. Is there any way I can find alternatives so that I don't have over 1600 columns? I found this interesting link. But they are converting to class/object which I don't want. I…

machine-learning dataset dataframe dimensionality-reduction encoding

asked Nov 14 '17 at 17:20

vinaykva

votes

2 answers

How to remove rows from a dataframe that are identical to another dataframe?

I have two data frames df1 and df2. For my analysis, I need to remove rows from df1 that have identical column values (Email) in df2? >>df1 First Last Email 0 Adam Smith email@email.com 1 John Brown email2@email.com 2 Joe Max …

python pandas dataframe

asked Aug 21 '18 at 10:22

a_a_a

votes

3 answers

after grouping to minimum value in pandas, how to display the matching row result entirely along min() value

The dataframe contains >> df A B C A 196512 196512 1325 12.9010511000000 196512 196512 114569 12.9267705000000 196512 196512 118910 12.8983353775637 196512 196512 100688 12.9505091000000 196795 196795 …

python pandas dataframe

asked Jan 05 '18 at 04:27

Sam Joe

votes

2 answers

Delete/Drop only the rows which has all values as NaN in pandas

I have a Dataframe, i need to drop the rows which has all the values as NaN. ID Age Gender 601 21 M 501 NaN F NaN NaN NaN The resulting data frame should look like. Id Age Gender 601 21 M 501 …

python pandas dataframe

asked Sep 09 '19 at 09:33

Harshith

votes

5 answers

How to Write Multiple Data Frames in an Excel Sheet

I have multiple data frames with same column names. I want to write them together to an excel sheet stacked vertically on top of each other. And between each, there will be a text occupying a row. This is what I have in mind. I tried the…

pandas dataframe excel data-table

asked Mar 01 '19 at 05:23

Della

votes

2 answers

dataframe.columns.difference() use

I am trying to find the working of dataframe.columns.difference() but couldn't find a satisfactory explanation about it. Can anyone explain the working of this method in detail?

pandas dataframe difference

asked Mar 01 '19 at 00:39

Parth S.

votes

3 answers

Find the consecutive zeros in a DataFrame and do a conditional replacement

I have a dataset like this: Sample Dataframe import pandas as pd df = pd.DataFrame({ 'names': ['A','B','C','D','E','F','G','H','I','J','K','L'], 'col1': [0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0], 'col2': [0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0,…

python pandas dataframe

asked Jul 20 '17 at 19:43

Kevin

votes

2 answers

Pandas merge column duplicate and sum value

How to merge duplicate column and sum their value? What I have A 30 A 40 B 50 What I need A 70 B 50 DF for this example d = {'address': ["A", "A", "B"], 'balances': [30, 40, 50]} df = pd.DataFrame(data=d) df

python pandas dataframe

asked Mar 10 '19 at 06:37

Руслан Миров

votes

2 answers

How to rename columns that have the same name?

I would like to rename the column names, but the Data Frame contains similar column names. How do I rename them? df.columns Output: Index([ 'Goods', 'Durable goods','Services','Exports', 'Goods', 'Services', 'Imports', 'Goods',…

pandas dataframe

asked Nov 20 '18 at 10:26

Antony Naveen

votes

1 answer

How to find the count of consecutive same string values in a pandas dataframe?

Assume that we have the following pandas dataframe: df = pd.DataFrame({'col1':['A>G','C>T','C>T','G>T','C>T', 'A>G','A>G','A>G'],'col2':['TCT','ACA','TCA','TCA','GCT', 'ACT','CTG','ATG'],…

pandas dataframe

asked Nov 19 '18 at 20:03

burcak

votes

2 answers

Mapping column values of one DataFrame to another DataFrame using a key with different header names

I have two data frames df1 and df2 which look something like this. cat1 cat2 cat3 0 10 25 12 1 11 22 14 2 12 30 15 all_cats cat_codes 0 10 A 1 11 B 2 12 C 3 25 …

python pandas dataframe

asked Oct 16 '18 at 15:50

Danny

1,166
1
8
16

2 3

…

23 24 Next

Questions tagged [dataframe]

data.frame in R

DataFrame in Python's pandas library

DataFrame in Apache Spark

DataFrame in Maple

`data.frame` in R