Pandas

Overview

Pandasarrow-up-right is an open source data analysis and manipulation tool that is ubiquitous across data science and machine learning.

We can install pandas using pip and then import

import pandas as pd

Data Types

Pandas has two incredibly important data types - the DataFrame and the Series.

DataFrames

A DataFrame is a table, with each entry corresponding to a row and column. If we wanted to create a table of individuals and their respective ages and heights:

df = pd.DataFrame({'Age': [22, 27], 'Height': [181, 173]})

When printed, it is displayed like this:

   Age  Height
0   22     181
1   27     173

We can give the rows labels by setting the index argument in the constructor

df = pd.DataFrame({'Age': [22, 27], 'Height': [181, 173]}, index=['Alice', 'Bob'])
       Age  Height
Alice   22     181
Bob     27     173

Series

A Series is a sequence of data values - effectively a list. We can create one from a list:

We can give each entry a label and also give the overall Series a name

We can think of a DataFrame as a load of Series "glued" together, and this will help us when it comes to manipulating data later!

Using Files

We can read data straight from files using functions such as read_csvarrow-up-right, which is very easy to understand. The DataFrame method to_csvarrow-up-right allows you to save it afterwards as well. Equivalents exist for other files.

Last updated