pandas notes 3

get single row values as list in pandas

To get the values of a single row in a pandas DataFrame as a list, you can use either the .iloc[] indexer or the .loc[] indexer. Here's how you can do it:

Using .iloc[] indexer (index-based selection):

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike'],
        'Age': [30, 25, 40, 35, 28],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Get values of the first row as a list
first_row_values = df.iloc[0].tolist()

print(first_row_values)

Using .loc[] indexer (label-based selection):

# Get values of the first row as a list
first_row_values = df.loc[0].tolist()

print(first_row_values)

Both of these methods will output the values of the first row of the DataFrame df as a list:

['John', 30, 'Male']

Replace 0 with the index of the row you want to retrieve the values for. These methods allow you to access the values of a single row and convert them into a list format.

show a single row and colunms in a table format + pandas

To display a single row and selected columns in a table format in pandas, you can use the iloc indexer to select the row by its index and specify the columns you want to display. Then, you can convert the resulting DataFrame into a formatted table using the to_string() method. Here's how you can do it:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike'],
        'Age': [30, 25, 40, 35, 28],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Select the first row and specific columns
row_index = 0  # Index of the row to display
selected_columns = ['Name', 'Age']  # Columns to display

# Create a DataFrame with the selected row and columns
selected_row_df = df.iloc[[row_index]][selected_columns]

# Convert the DataFrame to a formatted table
table = selected_row_df.to_string(index=False)

print(table)

Output:

 Name  Age
 John   30

In this example, row_index specifies the index of the row you want to display, and selected_columns specifies the columns you want to include in the table. We use the iloc indexer to select the specified row by its index and the specified columns. Then, we convert the resulting DataFrame into a formatted table using the to_string() method with index=False to exclude the row index.

loop by rows in pandas

You can loop through rows in a pandas DataFrame using the iterrows() method. Here's how you can do it:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike'],
        'Age': [30, 25, 40, 35, 28],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Loop through rows
for index, row in df.iterrows():
    print("Row index:", index)
    print("Values:", row)
    print()

This will output each row in the DataFrame df along with its index:

Row index: 0
Values: Name      John
Age         30
Gender    Male
Name: 0, dtype: object

Row index: 1
Values: Name      Alice
Age          25
Gender    Female
Name: 1, dtype: object

Row index: 2
Values: Name      Bob
Age        40
Gender    Male
Name: 2, dtype: object

Row index: 3
Values: Name      Emma
Age         35
Gender    Female
Name: 3, dtype: object

Row index: 4
Values: Name     Mike
Age        28
Gender    Male
Name: 4, dtype: object

In each iteration of the loop, index represents the index of the current row, and row represents the values of the current row as a pandas Series. You can access individual values within the row using dictionary-like indexing, such as row['Name'], row['Age'], etc.

create dataframe pandas

To create a pandas DataFrame, you can pass a dictionary of lists or arrays to the pd.DataFrame() constructor. Each key in the dictionary will become a column name, and the corresponding list or array will become the data for that column. Here's how you can create a simple DataFrame:

import pandas as pd

# Sample data
data = {
    'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike'],
    'Age': [30, 25, 40, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male']
}

# Create DataFrame
df = pd.DataFrame(data)

# Display DataFrame
print(df)

This will output:

    Name  Age  Gender
0   John   30    Male
1  Alice   25  Female
2    Bob   40    Male
3   Emma   35  Female
4   Mike   28    Male

In this example, each key in the data dictionary corresponds to a column in the DataFrame, and the lists associated with each key provide the data for that column. The DataFrame df is created by passing this dictionary to the pd.DataFrame() constructor.

get columns as list pandas

To get the columns of a pandas DataFrame as a list, you can access the columns attribute of the DataFrame. Here's how you can do it:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [30, 25, 40],
        'Gender': ['Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Get columns as a list
columns_list = df.columns.tolist()

print("Columns as a list:", columns_list)

Output:

Columns as a list: ['Name', 'Age', 'Gender']

In this example, df.columns returns a pandas Index object representing the columns of the DataFrame df, and .tolist() converts this Index object into a Python list containing the column names.

consider second row as header + pandas

To consider the second row as the header while creating a DataFrame in pandas, you can use the header parameter when reading the data from your source. Here's how you can do it:

import pandas as pd

# Sample data with second row as header
data = [
    ['Name', 'Age', 'Gender'],
    ['John', 30, 'Male'],
    ['Alice', 25, 'Female'],
    ['Bob', 40, 'Male']
]

# Create DataFrame with second row as header
df = pd.DataFrame(data[1:], columns=data[0])

print(df)

Output:

    Name  Age  Gender
0   John   30    Male
1  Alice   25  Female
2    Bob   40    Male

In this example, data is a list of lists where each inner list represents a row of data. We use the second row (data[1]) as the header by specifying it as the columns parameter when creating the DataFrame df using pd.DataFrame(). The remaining rows of data are used as the actual data for the DataFrame.

Find ms col index

If you want to find the index of a specific column in a pandas DataFrame based on its name, you can use the .get_loc() method of the DataFrame's columns. Here's how you can do it:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [30, 25, 40],
        'Gender': ['Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Name of the column whose index you want to find
column_name = 'Age'

# Get the index of the column
column_index = df.columns.get_loc(column_name)

print("Index of column '{}': {}".format(column_name, column_index))

Output:

Index of column 'Age': 1

In this example, column_name is the name of the column whose index we want to find. We use the .get_loc() method on df.columns to get the index of the column with the specified name. The resulting column_index variable contains the index of the column. Note that indexing in pandas starts from 0, so the index of the first column is 0, the index of the second column is 1, and so on.

get row by index in pandas

To get a specific row by its index in a pandas DataFrame, you can use the .iloc[] indexer. Here's how you can do it:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [30, 25, 40],
        'Gender': ['Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Get the row with index 1
row_index = 1
row = df.iloc[row_index]

print("Row with index {}: \n{}".format(row_index, row))

Output:

Row with index 1: 
Name      Alice
Age          25
Gender   Female
Name: 1, dtype: object

In this example, row_index is the index of the row you want to retrieve. We use the .iloc[] indexer to access the row with the specified index. The resulting row variable contains the data for the specified row as a pandas Series.

get row by multiple index in pandas

If you want to get multiple rows by their indices in a pandas DataFrame, you can pass a list of indices to the .iloc[] indexer. Here's how you can do it:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike'],
        'Age': [30, 25, 40, 35, 28],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Get rows with indices 1 and 3
row_indices = [1, 3]
rows = df.iloc[row_indices]

print("Rows with indices {}: \n{}".format(row_indices, rows))

Output:

Rows with indices [1, 3]: 
    Name  Age  Gender
1  Alice   25  Female
3   Emma   35  Female

In this example, row_indices is a list containing the indices of the rows you want to retrieve. We pass this list to the .iloc[] indexer, which returns the rows with the specified indices. The resulting rows variable contains the selected rows as a new DataFrame.

pandas iloc with multiple index

If you want to select specific rows and columns using .iloc[] indexer in pandas, you can pass lists of indices for rows and columns. Here's how you can do it:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike'],
        'Age': [30, 25, 40, 35, 28],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Select specific rows and columns by index
row_indices = [1, 3]  # Row indices
col_indices = [0, 2]  # Column indices

selected_data = df.iloc[row_indices, col_indices]

print("Selected data:")
print(selected_data)

Output:

Selected data:
    Name  Gender
1  Alice  Female
3   Emma  Female

In this example, row_indices and col_indices are lists containing the indices of the rows and columns you want to select, respectively. We pass these lists to the .iloc[] indexer, specifying both rows and columns using the syntax .iloc[row_indices, col_indices]. The resulting selected_data DataFrame contains the data from the specified rows and columns.

row from and to iloc + pandas

If you want to select specific rows and columns using .iloc[] indexer in pandas, you can pass lists of indices for rows and columns. Here's how you can do it:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike'],
        'Age': [30, 25, 40, 35, 28],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Select specific rows and columns by index
row_indices = [1, 3]  # Row indices
col_indices = [0, 2]  # Column indices

selected_data = df.iloc[row_indices, col_indices]

print("Selected data:")
print(selected_data)

Output:

Selected data:
    Name  Gender
1  Alice  Female
3   Emma  Female

get specific column values as list + pandas

To get the values of a specific column as a list in pandas, you can access the column using its name and then use the .tolist() method. Here's how you can do it:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike'],
        'Age': [30, 25, 40, 35, 28],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Get values of the 'Name' column as a list
name_list = df['Name'].tolist()

print("Values of 'Name' column as a list:", name_list)

Output:

Values of 'Name' column as a list: ['John', 'Alice', 'Bob', 'Emma', 'Mike']

In this example, df['Name'] accesses the 'Name' column of the DataFrame df, and .tolist() converts the values of this column into a Python list. You can replace 'Name' with the name of any column you want to retrieve the values for.

apply convert small case + pandas

You can use the apply() function in pandas to apply a function to each element of a Series. To convert all values in a column to lowercase, you can use the str.lower() method within apply(). Here's how you can do it:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike'],
        'Age': [30, 25, 40, 35, 28],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Convert values of the 'Name' column to lowercase
df['Name'] = df['Name'].apply(str.lower)

print(df)

Output:

    Name  Age  Gender
0   john   30    Male
1  alice   25  Female
2    bob   40    Male
3   emma   35  Female
4   mike   28    Male

In this example, df['Name'].apply(str.lower) applies the str.lower() method to each element of the 'Name' column, converting all values to lowercase. Then, the result is assigned back to the 'Name' column in the DataFrame df. You can replace 'Name' with the name of any column you want to convert to lowercase.

pandas drop specific column

To drop a specific column from a DataFrame in pandas, you can use the .drop() method with the axis parameter set to 1. Here's how you can do it:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [30, 25, 40],
        'Gender': ['Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Drop the 'Gender' column
df = df.drop('Gender', axis=1)

print(df)

Output:

    Name  Age
0   John   30
1  Alice   25
2    Bob   40

In this example, df.drop('Gender', axis=1) drops the 'Gender' column from the DataFrame df. The axis=1 parameter specifies that we want to drop a column (as opposed to dropping a row, which would be axis=0). The resulting DataFrame df does not contain the 'Gender' column.

how-to-check-if-any-value-is-nan-in-a-pandas-dataframe

You can check if any value is NaN (missing) in a pandas DataFrame using the .isna() method, followed by the .any() method. Here's how you can do it:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike'],
        'Age': [30, None, 40, 35, 28],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Check if any value is NaN in the DataFrame
is_nan = df.isna().any().any()

if is_nan:
    print("DataFrame contains NaN values.")
else:
    print("DataFrame does not contain NaN values.")

Output:

DataFrame contains NaN values.

In this example, df.isna() returns a DataFrame of the same shape as df, where each element is True if the corresponding element in df is NaN, and False otherwise. Then, .any().any() checks if there is any True value in the resulting DataFrame, indicating the presence of NaN values. If any NaN value is found, the condition is_nan will be True, otherwise, it will be False.

exclude "Unnamed" columns in pandas

When reading a CSV file into a pandas DataFrame, sometimes unnamed columns are created, which usually occur due to extraneous commas in the file or missing column names. To exclude these "Unnamed" columns, you can filter out columns whose names contain "Unnamed". Here's how you can do it:

import pandas as pd

# Sample DataFrame with unnamed columns
data = {'Name': ['John', 'Alice', 'Bob'],
        'Unnamed: 0': [30, 25, 40],
        'Unnamed: 1': ['Male', 'Female', 'Male'],
        'Age': [30, 25, 40],
        'Gender': ['Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Exclude "Unnamed" columns
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

print(df)

Output:

    Name  Age  Gender
0   John   30    Male
1  Alice   25  Female
2    Bob   40    Male

In this example, df.columns.str.contains('^Unnamed') returns a boolean mask indicating whether each column name contains "Unnamed". We use ~ to invert this mask, so that it's True for columns not containing "Unnamed". Then, we use .loc[] to select columns based on this mask, effectively excluding the "Unnamed" columns.

loop pandas rows

You can loop through rows in a pandas DataFrame using the .iterrows() method. Here's how you can do it:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [30, 25, 40],
        'Gender': ['Male', 'Female', 'Male']}
df = pd.DataFrame(data)

# Loop through rows
for index, row in df.iterrows():
    print("Row index:", index)
    print("Values:", row)
    print()

Output:

Row index: 0
Values: Name      John
Age          30
Gender     Male
Name: 0, dtype: object

Row index: 1
Values: Name      Alice
Age          25
Gender    Female
Name: 1, dtype: object

Row index: 2
Values: Name       Bob
Age         40
Gender    Male
Name: 2, dtype: object

read_excel with column data format + pandas

When reading an Excel file with read_excel in pandas, you can specify the data types of columns using the dtype parameter. Here's how you can do it:

import pandas as pd

# Read Excel file with specified column data types
df = pd.read_excel('your_excel_file.xlsx', dtype={'Column1': int, 'Column2': str, 'Column3': float})

# Display DataFrame
print(df)

Replace 'your_excel_file.xlsx' with the path to your Excel file, and specify the column names and their corresponding data types in the dtype parameter. In the example above, 'Column1' is set as integer (int), 'Column2' as string (str), and 'Column3' as float (float).

By specifying the data types, you can ensure that pandas interprets the data correctly when reading the Excel file. This can be useful for preventing pandas from inferring incorrect data types, especially for columns containing numerical or date values.

Previouspandas notes 2 Nextturtle_notes

Last updated 1 year ago