Lessons
Learn Pandas
Pandas DataFrames in Python
When working with data in Python, the pandas library is one of the most popular tools. It helps manage and analyze data in a simple way. One of the most important features in pandas is the DataFrame.
What is a Pandas DataFrame?
A pandas DataFrame is a table-like data structure. It stores data in rows and columns, just like a spreadsheet or a SQL table. This makes it easy to read, understand, and work with data.
You can use a DataFrame to:
- Store different types of data
- Access and update specific parts of the data
- Perform operations like sorting, filtering, and grouping
Many people search for what is pandas dataframe or dataframe in python for beginners because it's the starting point in data analysis using Python.
Main Features of Pandas DataFrame
Let’s understand what makes a DataFrame useful:
1. Two-Dimensional Structure
A DataFrame holds data in two dimensions. This means it has rows and columns, similar to an Excel sheet. Each row and each column has a label or index.
2. Size-Mutable
You can change the size of the DataFrame. You can add or remove rows and columns whenever needed.
3. Heterogeneous Data
A DataFrame can hold different types of data. For example, one column can have numbers while another column has text.
4. Labeled Axes
The rows and columns in a DataFrame have labels. You can use these labels to easily select data. These features make the DataFrame a powerful structure for handling real-world data.
Pandas DataFrame Analogy
You can think of a pandas DataFrame as a dictionary of Series. A Series in pandas is a one-dimensional array with labels. So, a DataFrame is like a bunch of Series placed side by side, sharing the same row labels. This idea helps you understand how pandas stores and aligns data internally.
How to Use Pandas in Python
Before you use DataFrame, you need to install and import pandas. You can install pandas using this command:
python
1pip install pandas
To use pandas in your Python code, import it like this:
python
1import pandas as pd
The pd
part is a short name or alias that makes your code cleaner and easier to read.
Create a Pandas DataFrame
This section is for beginners who want to know how to create a DataFrame in Python using the pandas library. We'll use basic examples that are easy to understand.
Different Ways to Create a Pandas DataFrame
You can create a pandas DataFrame in multiple ways. The most common methods are:
- From a list
- From a list of lists
- From a dictionary
- From a list of dictionaries
Let’s look at each method with simple examples.
1. Create DataFrame from a List
You can create a DataFrame from a single list. In this case, each element in the list becomes a row.
Example:
python
1import pandas as pd
2
3# A list of strings
4data = ['Python', 'Pandas', 'Data', 'Frame']
5
6# Create DataFrame
7df = pd.DataFrame(data)
8
9# Display result
10print(df)
Output:
text
1 0
20 Python
31 Pandas
42 Data
53 Frame
Here, pandas automatically adds a column with index 0
and row numbers from 0
to 3
.
2. Create DataFrame from a List of Lists
Each list inside the main list becomes a row, and each item inside becomes a column.
Example:
python
1import pandas as pd
2
3# List of lists
4data = [['Tom', 20], ['Jerry', 22], ['Mickey', 25]]
5
6# Create DataFrame
7df = pd.DataFrame(data, columns=['Name', 'Age'])
8
9print(df)
Output:
1 Name Age
20 Tom 20
31 Jerry 22
42 Mickey 25
This example shows how you can give names to the columns using the columns
parameter.
Create DataFrame from a Dictionary
A dictionary can also be used to create a DataFrame. The keys in the dictionary become column names, and the values become column data.
Example:
python
1import pandas as pd
2
3# Dictionary with equal-length lists
4data = {
5 'Name': ['Tom', 'Nick', 'Krish', 'Jack'],
6 'Age': [20, 21, 19, 18]
7}
8
9df = pd.DataFrame(data)
10
11print(df)
Output:
1 Name Age
20 Tom 20
31 Nick 21
42 Krish 19
53 Jack 18
Note: Make sure all values in the dictionary (the lists) have the same length. Otherwise, it will show an error.
Create DataFrame from a List of Dictionaries
Each dictionary becomes a row, and the keys become column names.
Example:
python
1import pandas as pd
2
3# List of dictionaries
4data = [
5 {'Name': 'Tom', 'Age': 20},
6 {'Name': 'Nick', 'Age': 21},
7 {'Name': 'Krish', 'Age': 19}
8]
9
10df = pd.DataFrame(data)
11
12print(df)
Output:
1 Name Age
20 Tom 20
31 Nick 21
42 Krish 19
This method is very common when loading data from external sources like JSON or APIs.
Rows and Columns in a DataFrame
This section is helpful for beginners who want to learn how to access, select, and update rows and columns in a pandas DataFrame. It covers basic operations that are used often in data analysis.
Access Columns in a DataFrame
To access a column, you can use either square brackets []
or dot .
notation.
Example 1: Using Square Brackets
python
1import pandas as pd
2
3data = {
4 'Name': ['Tom', 'Jerry', 'Mickey'],
5 'Age': [20, 21, 19]
6}
7
8df = pd.DataFrame(data)
9
10# Access 'Name' column
11print(df['Name'])
Output:
text
10 Tom
21 Jerry
32 Mickey
4Name: Name, dtype: object
Use square brackets if the column name has spaces or special characters.
Example 2: Using Dot Notation
python
1print(df.Name)
This gives the same output. But avoid this method if your column name has spaces or clashes with built-in methods.
Access Multiple Columns
You can pass a list of column names to get more than one column.
python
1print(df[['Name', 'Age']])
Access Rows in a DataFrame
You can use .loc[]
or .iloc[]
to access rows.
1. .loc[] for Row by Label
.loc[]
uses the index label. It is mostly used when you know the row index name.
python
1# Get row with index label 1
2print(df.loc[1])
Output:
text
1Name Jerry
2Age 21
3Name: 1, dtype: object
2. .iloc[] for Row by Position
.iloc[]
is used for accessing rows by their position (like using list indexing).
python
1# Get second row (position 1)
2print(df.iloc[1])
Same output as .loc[1]
in this case.
Access a Cell (Specific Value)
You can combine row and column selection.
python
1# Get the value in row 1, column 'Name'
2print(df.loc[1, 'Name']) # Output: Jerry
Or using position:
python
1print(df.iloc[1, 0]) # Output: Jerry
Add a New Column
You can add a new column using assignment.
python
1df['Country'] = ['USA', 'UK', 'Canada']
2print(df)
Output:
1 Name Age Country
20 Tom 20 USA
31 Jerry 21 UK
42 Mickey 19 Canada
Remove a Column
Use the drop()
function with axis=1
.
python
1df = df.drop('Country', axis=1)
2print(df)