Bigdata – Knowledge Base

Pandas Dataframe – All Operations

Detailed Document on Pandas DataFrame Operations


Introduction to Pandas DataFrame #

Pandas is a powerful Python library for data analysis and manipulation. A DataFrame is a two-dimensional, size-mutable, and heterogeneous data structure, similar to a table in a relational database or an Excel spreadsheet.


Creating a DataFrame #


Basic Operations #

1. Viewing Data #

  • head(n): View the first n rows (default: 5).
  • tail(n): View the last n rows (default: 5).
  • info(): Summary of the DataFrame.
  • describe(): Statistical summary of numerical columns.

2. Accessing Data #

  • Column selection: df['column_name'] or df.column_name
  • Row selection: df.loc[index] (label-based) or df.iloc[index] (integer-based)

Data Manipulation #

1. Adding Columns #

2. Dropping Columns or Rows #

  • drop(): Remove specific rows or columns.

3. Renaming Columns #

4. Filtering Data #

5. Sorting Data #


Aggregation and Grouping #

1. Aggregation Functions #

  • sum(), mean(), min(), max(), etc.

2. Grouping Data #


Handling Missing Data #

1. Detecting Missing Values #

  • isnull(): Check for missing values.
  • notnull(): Check for non-missing values.

2. Filling Missing Values #

3. Dropping Missing Values #


Merging, Joining, and Concatenation #

1. Merging #

2. Concatenation #

3. Joining #


Advanced Operations #

1. Applying Functions #

2. Pivot Tables #

3. Working with Dates #


Saving and Loading Data #

1. Saving to a File #

2. Loading from a File #


Conclusion #

Pandas DataFrame provides a versatile and efficient way to handle and analyze structured data. Mastering these operations will significantly enhance your data analysis workflow.

What are your feelings
Updated on December 23, 2024