Lessons

Learn Pandas

Pandas DataFrames in Python

When working with data in Python, the pandas library is one of the most popular tools. It helps manage and analyze data in a simple way. One of the most important features in pandas is the DataFrame.

What is a Pandas DataFrame?

A pandas DataFrame is a table-like data structure. It stores data in rows and columns, just like a spreadsheet or a SQL table. This makes it easy to read, understand, and work with data.

You can use a DataFrame to:

  • Store different types of data
  • Access and update specific parts of the data
  • Perform operations like sorting, filtering, and grouping
Many people search for what is pandas dataframe or dataframe in python for beginners because it's the starting point in data analysis using Python.

Main Features of Pandas DataFrame

Let’s understand what makes a DataFrame useful:

1. Two-Dimensional Structure

A DataFrame holds data in two dimensions. This means it has rows and columns, similar to an Excel sheet. Each row and each column has a label or index.

2. Size-Mutable

You can change the size of the DataFrame. You can add or remove rows and columns whenever needed.

3. Heterogeneous Data

A DataFrame can hold different types of data. For example, one column can have numbers while another column has text.

4. Labeled Axes

The rows and columns in a DataFrame have labels. You can use these labels to easily select data. These features make the DataFrame a powerful structure for handling real-world data.

Pandas DataFrame Analogy

You can think of a pandas DataFrame as a dictionary of Series. A Series in pandas is a one-dimensional array with labels. So, a DataFrame is like a bunch of Series placed side by side, sharing the same row labels. This idea helps you understand how pandas stores and aligns data internally.

How to Use Pandas in Python

Before you use DataFrame, you need to install and import pandas. You can install pandas using this command:

python
1pip install pandas

To use pandas in your Python code, import it like this:

python
1import pandas as pd

The pd part is a short name or alias that makes your code cleaner and easier to read.

Create a Pandas DataFrame

This section is for beginners who want to know how to create a DataFrame in Python using the pandas library. We'll use basic examples that are easy to understand.

Different Ways to Create a Pandas DataFrame

You can create a pandas DataFrame in multiple ways. The most common methods are:

  • From a list
  • From a list of lists
  • From a dictionary
  • From a list of dictionaries

Let’s look at each method with simple examples.

1. Create DataFrame from a List

You can create a DataFrame from a single list. In this case, each element in the list becomes a row.

Example:

python
1import pandas as pd
2
3# A list of strings
4data = ['Python', 'Pandas', 'Data', 'Frame']
5
6# Create DataFrame
7df = pd.DataFrame(data)
8
9# Display result
10print(df)

Output:

text
1         0
20   Python
31   Pandas
42     Data
53    Frame

Here, pandas automatically adds a column with index 0 and row numbers from 0 to 3.

2. Create DataFrame from a List of Lists

Each list inside the main list becomes a row, and each item inside becomes a column.

Example:

python
1import pandas as pd
2
3# List of lists
4data = [['Tom', 20], ['Jerry', 22], ['Mickey', 25]]
5
6# Create DataFrame
7df = pd.DataFrame(data, columns=['Name', 'Age'])
8
9print(df)

Output:

1    Name  Age
20    Tom   20
31  Jerry   22
42 Mickey   25

This example shows how you can give names to the columns using the columns parameter.

Create DataFrame from a Dictionary

A dictionary can also be used to create a DataFrame. The keys in the dictionary become column names, and the values become column data.

Example:

python
1import pandas as pd
2
3# Dictionary with equal-length lists
4data = {
5    'Name': ['Tom', 'Nick', 'Krish', 'Jack'],
6    'Age': [20, 21, 19, 18]
7}
8
9df = pd.DataFrame(data)
10
11print(df)

Output:

1    Name  Age
20    Tom   20
31   Nick   21
42  Krish   19
53   Jack   18
Note: Make sure all values in the dictionary (the lists) have the same length. Otherwise, it will show an error.

Create DataFrame from a List of Dictionaries

Each dictionary becomes a row, and the keys become column names.

Example:

python
1import pandas as pd
2
3# List of dictionaries
4data = [
5    {'Name': 'Tom', 'Age': 20},
6    {'Name': 'Nick', 'Age': 21},
7    {'Name': 'Krish', 'Age': 19}
8]
9
10df = pd.DataFrame(data)
11
12print(df)

Output:

1    Name  Age
20    Tom   20
31   Nick   21
42  Krish   19

This method is very common when loading data from external sources like JSON or APIs.

Rows and Columns in a DataFrame

This section is helpful for beginners who want to learn how to access, select, and update rows and columns in a pandas DataFrame. It covers basic operations that are used often in data analysis.

Access Columns in a DataFrame

To access a column, you can use either square brackets [] or dot . notation.

Example 1: Using Square Brackets

python
1import pandas as pd
2
3data = {
4    'Name': ['Tom', 'Jerry', 'Mickey'],
5    'Age': [20, 21, 19]
6}
7
8df = pd.DataFrame(data)
9
10# Access 'Name' column
11print(df['Name'])

Output:

text
10      Tom
21    Jerry
32   Mickey
4Name: Name, dtype: object

Use square brackets if the column name has spaces or special characters.

Example 2: Using Dot Notation

python
1print(df.Name)

This gives the same output. But avoid this method if your column name has spaces or clashes with built-in methods.

Access Multiple Columns

You can pass a list of column names to get more than one column.

python
1print(df[['Name', 'Age']])

Access Rows in a DataFrame

You can use .loc[] or .iloc[] to access rows.

1. .loc[] for Row by Label

.loc[] uses the index label. It is mostly used when you know the row index name.

python
1# Get row with index label 1
2print(df.loc[1])

Output:

text
1Name    Jerry
2Age         21
3Name: 1, dtype: object

2. .iloc[] for Row by Position

.iloc[] is used for accessing rows by their position (like using list indexing).

python
1# Get second row (position 1)
2print(df.iloc[1])

Same output as .loc[1] in this case.

Access a Cell (Specific Value)

You can combine row and column selection.

python
1# Get the value in row 1, column 'Name'
2print(df.loc[1, 'Name'])  # Output: Jerry

Or using position:

python
1print(df.iloc[1, 0])  # Output: Jerry

Add a New Column

You can add a new column using assignment.

python
1df['Country'] = ['USA', 'UK', 'Canada']
2print(df)

Output:

1     Name  Age Country
20     Tom   20     USA
31   Jerry   21      UK
42  Mickey   19  Canada

Remove a Column

Use the drop() function with axis=1.

python
1df = df.drop('Country', axis=1)
2print(df)

Frequently Asked Questions