Top 15 Functions of Pandas - Programmingdoor

Pandas is a popular Python library for data manipulation and analysis. It provides numerous functions and methods to work with data in the form of DataFrame and Series objects. Here’s an overview of some commonly used functions in Pandas:

1. pandas.read_csv(): Reads data from a CSV file and creates a DataFrame.

import pandas as pd

df = pd.read_csv('data.csv')

2. pandas.DataFrame(): Creates a DataFrame from various data sources, including lists, dictionaries, and NumPy arrays.

data = {'Column1': [1, 2, 3], 'Column2': ['A', 'B', 'C']}
df = pd.DataFrame(data)

3. df.head(): Returns the first n rows of the DataFrame (default is 5).

df.head(3)  # Returns the first 3 rows of the DataFrame.

4. df.tail(): Returns the last n rows of the DataFrame (default is 5).

df.tail(2)  # Returns the last 2 rows of the DataFrame.

5. df.info(): Provides a summary of the DataFrame, including data types and missing values.

df.info()

6. df.describe(): Generates descriptive statistics for numerical columns.

df.describe()

7. df.shape: Returns a tuple representing the dimensions of the DataFrame (rows, columns).

shape = df.shape  # Returns (3, 2) for a DataFrame with 3 rows and 2 columns.

8. df.columns: Returns a list of column names in the DataFrame.

columns = df.columns

9. df['Column_Name']: Accesses a specific column in the DataFrame by its name, returning a Series.

column_data = df['Column1']

10. df[['Column1', 'Column2']]: Accesses multiple columns in the DataFrame, returning a new DataFrame.

subset = df[['Column1', 'Column2']]

11. df.loc[]: Selects rows and columns by labels.

subset = df.loc[0:2, ['Column1', 'Column2']]  # Selects rows 0 to 2 and columns 'Column1' and 'Column2'.

12. df.iloc[]: Selects rows and columns by integer positions.

subset = df.iloc[0:2, 0:2]  # Selects rows 0 to 1 and columns 0 to 1.

13. df.sort_values(): Sorts the DataFrame by one or more columns.

sorted_df = df.sort_values(by='Column1')

14. df.groupby(): Groups data in the DataFrame by one or more columns for aggregation.

grouped = df.groupby('Category')['Value'].mean()

15. df.drop(): Drops specified rows or columns from the DataFrame.

df.drop(0, axis=0, inplace=True)  # Drops the first row.
df.drop('Column1', axis=1, inplace=True)  # Drops the 'Column1'.

These are just a few of the many functions and methods available in Pandas. Pandas is a versatile library that provides extensive capabilities for data manipulation, cleaning, analysis, and visualization. Learning how to use these functions effectively can greatly enhance our data processing and analysis workflows.