Extra Challenges
Last updated on 2023-05-18 | Edit this page
A collection of challenges that have been either removed from or not (yet) added to the main lesson.
06-loops-and-functions
03-index-slice-subset
Additional slicing challenge
You can also select every Nth row by providing a third number inside
the []
, e.g. surveys_df[1:10:2]
returns every
other row in the DataFrame, from the second to the tenth:
OUTPUT
record_id month day year plot_id species_id sex hindfoot_length weight
1 2 7 16 1977 3 NL M 33.0 NaN
3 4 7 16 1977 7 DM M 36.0 NaN
5 6 7 16 1977 1 PF M 14.0 NaN
7 8 7 16 1977 1 DM M 37.0 NaN
9 10 7 16 1977 6 PF F 20.0 NaN
Given this, what do you think will happen when you run
surveys_df[::-1]
? After you have predicted the result, run
the code to see if you were correct.
surveys_df[::-1]
provides every row of the DataFrame, in
reverse order.
Looping Over DataFrame
The file surveys.csv
in the data
folder
contains 25 years of data from surveys, starting from 1977. We can
extract data corresponding to each year in this DataFrame to individual
CSV files, by using a for
loop:
PYTHON
import pandas as pd
# Load the data into a DataFrame
surveys_df = pd.read_csv('data/surveys.csv')
# Loop through a sequence of years and export selected data
start_year = 1977
end_year = 2002
for year in range(start_year, end_year+1):
# Select data for the year
surveys_year = surveys_df[surveys_df.year == year]
# Write the new DataFrame to a CSV file
filename = 'data/surveys' + str(year) + '.csv'
surveys_year.to_csv(filename)
What happens if there is no data for a year in a sequence? For
example, imagine we used 1976
as the
start_year
We get the expected files for all years between 1977 and 2002, plus
an empty data/surveys1976.csv
file with only the
headers.