The Kerala Bird Atlas (KBA), the first-of-its-kind state-level bird atlas in India, has created solid baseline data about the distribution and abundance of various bird species across all major habitats giving an impetus for futuristic studies.
The entire state of Kerala was systematically surveyed twice a year during 2015–20. It is arguably Asia’s largest bird atlas in terms of geographical extent, sampling effort and species coverage derived from the aggregation of 25,000 checklists.
KBA accounted for nearly three lakh records of 361
species, including 94
very rare species, 103
rare species, 110
common species, 44
very common species, and 10
most abundant species.
Citizen-science driven exercises (e.g. bird surveys) and online platforms (e.g. eBird) provide voluminous data on bird occurrence. However, the semi-structured nature of their data collection makes it difficult to compare bird distribution across space and time.
Data on the distribution of species and the factors governing the same are prerequisites for effective and efficient conservation efforts. Such information is necessary to inform the selection of protected areas, to assess habitat associations and to predict the likely effects of future en-vironmental changes.
Kerala lies between 8°18'
N and 12°48'
N lat. and 74°52'
E and 77°22'
E long. in southwestern India. Wedged between the Arabian Sea and the windward side of the Western Ghats, it receives abundant rainfall (180–360
cm) and experiences a tropical climate. Elevation in the region varies from –2.2
m (Kuttanad) to 2695
m (Anamudi peak). It is spread across an area of 38,863
sq. km, of which 27%
is under forest cover, 66%
is under cultivation and 7%
constitutes built-up areas/wetlands/uncultivated land. Onefourth of the Western Ghats range falls within Kerala. Surveys for KBA were conducted in all 14
administrative units (districts) of Kerala.
Kerala was divided into cells of size 3.75 min x 3.75 min
(equivalent to 6.6 km x 6.6 km
) aligned to Survey of India maps. A total of 915 cells were laid out covering the entire state. Each cell was further divided into four quadrants of size 3.3 km x 3.3 km
. Each quadrant was then sub-divided into 9
sub-cells of size 1.1 x 1.1 km
. A single, randomly selected sub-cell in every quadrant was chosen for the survey. Grids were laid and the randomly selected sub-cells were marked on the map prior to the survey. A total of 63
sub-cells were found to be located in inaccessible cliffs or valleys, and these were replaced by adjacent accessible sub-cells with the same habitat type from the same quadrants.
This observation dataset contains 3 files :
Reference : Bird Count India, The Hindu, Times of India, Mysore Bird Atlas
# How to deal RDS files inside python 🐍 ?
RDS Format is a file format for storing R objects. Our data can be saved in RDS format. The .rds file extension is most often used in R.
Saving data into R data formats can reduce the size of large files by considerably.
We can install it easily with pip: pip install pyreadr
!pip install pyreadr
import pyreadr
result = pyreadr.read_r('../path_to_file.rds') # also works for RData
print(f"Class type of 'result' \t:: {type(result)}")
df = result[None] # extract the pandas data frame
print(f"Class type of 'df' \t:: {type(df)}")
>>> Class type of 'result' :: <class 'collections.OrderedDict'>
>>> Class type of 'df' :: <class 'pandas.core.frame.DataFrame'>
Reference : StackoverFlow, GitHub
Date : The field surveys were conducted from 2015 to 2020
.
Season : The field surveys were conducted twice a year ie, during
n.observers aka Number of observers
2–5
volunteers for each atlas surveys groups, but there are occations more volunteers involved in the survey.2-13
.We calculated the endemic score of every cell based on the number of endemic species reported from it.
We calculated the threat and SoIB (State of India’s Bird) score for every cell. SoIB utilized the eBird data to estimate indices of population trends and range size for 867 of India’s 1333 bird species.
Threat categories were based on the IUCN Red List 32 and were scored as follows:
NOTE :
# !pip install pyreadr # https://github.com/ofajardo/pyreadr#installation
import pyreadr
import re
import numpy as np
import pandas as pd
from os import listdir
from operator import itemgetter
import seaborn as sns
import matplotlib.pyplot as plt
from tqdm import tqdm # For progress bar
# Suppress warnings
import warnings
warnings.filterwarnings('ignore')
# Settings for pretty nice plots
# https://matplotlib.org/3.5.1/gallery/style_sheets/style_sheets_reference.html
plt.style.use('fivethirtyeight')
plt.show()
# For highlighting text on 'print' command
class text_co:
reset = '\u001b[0m'
blue = '\u001b[34;1m'
red = '\u001b[31;1m'
light_blue = '\u001b[34m'
# List of files available
print(listdir('../Kerala_Birds/data_files'))
['kba_species.details.rds', 'kba_names.rds', 'kba_data.rds']
# Like mentioned above we are using "pyreadr" for reading ".rds" files.
PATH = '../Kerala_Birds/data_files/'
kba_data = pyreadr.read_r(PATH + 'kba_data.rds')
kba_names = pyreadr.read_r(PATH + 'kba_names.rds')
kba_species = pyreadr.read_r(PATH + 'kba_species.details.rds')
kba_data_df = kba_data[None]
kba_names_df = kba_names[None]
kba_species_df = kba_species[None]
# Let's check and confirm all are Pandas dataframes
print(f"Class type of 'kba_data_df' is : {type(kba_data_df)}")
print(f"Class type of 'kba_names_df' is : {type(kba_names_df)}")
print(f"Class type of 'kba_species_df' is : {type(kba_species_df)}")
Class type of 'kba_data_df' is : <class 'pandas.core.frame.DataFrame'> Class type of 'kba_names_df' is : <class 'pandas.core.frame.DataFrame'> Class type of 'kba_species_df' is : <class 'pandas.core.frame.DataFrame'>
# Let's see aand check the size of our all 3 datasets before going to exploration
print(f"Size of 'kba_data_df' : {text_co.blue}{kba_data_df.shape}\n")
kba_data_df.head()
Size of 'kba_data_df' : (300882, 10)
Common.Name | Date | Time | n.observers | County | Sub.cell | Season | DEM | Cell.ID | List.ID | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Asian Koel | 7/16/2015 | 16:30 | 2.0 | Alappuzha | [51,2,2] | Wet | 5.0 | [76.28,9.84] | List.1 |
1 | Black-rumped Flameback | 7/16/2015 | 16:30 | 2.0 | Alappuzha | [51,2,2] | Wet | 5.0 | [76.28,9.84] | List.1 |
2 | Black Drongo | 7/16/2015 | 16:30 | 2.0 | Alappuzha | [51,2,2] | Wet | 5.0 | [76.28,9.84] | List.1 |
3 | Brahminy Kite | 7/16/2015 | 16:30 | 2.0 | Alappuzha | [51,2,2] | Wet | 5.0 | [76.28,9.84] | List.1 |
4 | Common Myna | 7/16/2015 | 16:30 | 2.0 | Alappuzha | [51,2,2] | Wet | 5.0 | [76.28,9.84] | List.1 |
print(f"Size of 'kba_names_df' : {text_co.blue}{kba_names_df.shape}\n")
kba_names_df.head()
Size of 'kba_names_df' : (492, 4)
Common.Name | Scientific.Name | Action | Assigned.Name.for.Atlas | |
---|---|---|---|---|
0 | Asian Koel | Eudynamys scolopaceus | Asian Koel | |
1 | Black-rumped Flameback | Dinopium benghalense | Black-rumped Flameback | |
2 | Black Drongo | Dicrurus macrocercus | Black Drongo | |
3 | Brahminy Kite | Haliastur indus | Brahminy Kite | |
4 | Common Myna | Acridotheres tristis | Common Myna |
print(f"Size of 'kba_species_df' : {text_co.blue}{kba_species_df.shape}\n")
kba_species_df.head()
Size of 'kba_species_df' : (361, 9)
Common.Name | Scientific.Name | Resident.status | IUCN.Redlist.Status | Distribution.Status | SoIB.status | Order | Family | Endemicity | |
---|---|---|---|---|---|---|---|---|---|
0 | Zitting Cisticola | Cisticola juncidis | Resident | Least Concern | Very Large | Low | Passeriformes | Cisticolidae | Not endemic |
1 | Ashy Prinia | Prinia socialis | Resident | Least Concern | Very Large | Low | Passeriformes | Cisticolidae | Not endemic |
2 | Yellow-throated Bulbul | Pycnonotus xantholaemus | Resident | Vulnerable | Moderate | Moderate | Passeriformes | Pycnonotidae | Not endemic |
3 | Yellow-footed Green-Pigeon | Treron phoenicopterus | Resident | Least Concern | Very Large | Low | Columbiformes | Columbidae | Not endemic |
4 | Yellow-eyed Babbler | Chrysomma sinense | Resident | Least Concern | Very Large | Low | Passeriformes | Sylviidae | Not endemic |
# Renaming column names for easiness. Replacing '.' with '_'
kba_data_df.columns = ['_'.join(column.split('.')) for column in kba_data_df.columns]
kba_names_df.columns = ['_'.join(column.split('.')) for column in kba_names_df.columns]
kba_species_df.columns = ['_'.join(column.split('.')) for column in kba_species_df.columns]
# Checking for missing Values
print('kba_data dataframe')
print('-' * 18)
print(kba_data_df.isnull().sum(), '\n')
print('kba_names dataframe')
print('-' * 19)
print(kba_names_df.isnull().sum(), '\n')
print('kba_species dataframe')
print('-' * 21)
print(kba_species_df.isnull().sum())
kba_data dataframe ------------------ Common_Name 0 Date 0 Time 0 n_observers 0 County 0 Sub_cell 127 Season 0 DEM 0 Cell_ID 127 List_ID 127 dtype: int64 kba_names dataframe ------------------- Common_Name 0 Scientific_Name 0 Action 0 Assigned_Name_for_Atlas 0 dtype: int64 kba_species dataframe --------------------- Common_Name 0 Scientific_Name 0 Resident_status 0 IUCN_Redlist_Status 0 Distribution_Status 0 SoIB_status 0 Order 0 Family 0 Endemicity 0 dtype: int64
missing_val_columns = ['Sub_cell', 'Cell_ID', 'List_ID']
for column in missing_val_columns:
missing_count = kba_data_df[column].isnull().sum()
per_value = missing_count * 100 / kba_data_df.shape[0]
print(f'{text_co.blue}{column}{text_co.reset} is having {text_co.red}{missing_count}{text_co.reset} missing values ie, {text_co.red}{per_value:.4f}%{text_co.reset} of total data')
kba_data_df.drop(missing_val_columns, 1, inplace = True)
print(f"\nShape of '{text_co.blue}kba_data_df{text_co.reset}' after dropping columns {text_co.blue}{kba_data_df.shape}")
Sub_cell is having 127 missing values ie, 0.0422% of total data Cell_ID is having 127 missing values ie, 0.0422% of total data List_ID is having 127 missing values ie, 0.0422% of total data Shape of 'kba_data_df' after dropping columns (300882, 7)
# Some names in 'kba_data_df' and 'kba_species_df' are different, so correcting it
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray Heron', 'Grey Heron', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray Wagtail', 'Grey Wagtail', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray Francolin', 'Grey Francolin', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Dollarbird', 'Oriental Dollarbird', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray Junglefowl', 'Grey Junglefowl', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Rock Pigeon', 'Rock Pigeon (Feral)', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-headed Bulbul', 'Grey-headed Bulbul', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Large Gray Babbler', 'Large Grey Babbler', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-bellied Cuckoo', 'Grey-bellied Cuckoo', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-headed Lapwing', 'Grey-headed Lapwing', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-headed Swamphen', 'Grey-headed Swamphen', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-breasted Prinia', 'Grey-breasted Prinia', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Indian Gray Hornbill', 'Indian Grey Hornbill', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Malabar Gray Hornbill', 'Malabar Grey Hornbill', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-fronted Green-Pigeon', 'Grey-fronted Green-Pigeon', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-headed Canary-Flycatcher', 'Grey-headed Canary-Flycatcher', x))
#Convert 'Date' column from 'category' to 'datetime64[ns]'
kba_data_df['Date'] = pd.to_datetime(kba_data_df['Date']).astype('datetime64')
# Converting 'Time' column to integer for creating new feature
# https://www.geeksforgeeks.org/python-get-first-element-of-each-sublist/
# https://www.geeksforgeeks.org/python-converting-all-strings-in-list-to-integers/
# kba_data_df['Time_Hour'] = list(map(int, list(map(itemgetter(0), kba_data_df.Time.str.split(':')))))
kba_data_df.insert(6, 'Time_Hour', list(map(int, list(map(itemgetter(0), kba_data_df.Time.str.split(':'))))))
'''
map()'s making complicated ??? Instead of this we can use the bellow also. Dose the same job.
kba_data_df['Time2'] = list(map(itemgetter(0), kba_data_df.Time.str.split(':')))
kba_data_df['Time2'] = kba_data_df['Time2'].astype('int64')
'''
# kba_data_df['AM/PM'] = ['PM' if kba_data_df['Time2'][i]>12 else 'AM' for i in range(kba_data_df.shape[0])]
kba_data_df.insert(7, 'AM/PM', ['PM' if kba_data_df['Time_Hour'][i]>12 else 'AM' for i in range(kba_data_df.shape[0])]
)
# kba_data_df.drop('Time_Hour', 1, inplace = True)
kba_data_df['n_observers'] = pd.to_datetime(kba_data_df['n_observers']).astype('int64') # Human count is always int, so converting into int.
styles = [dict(selector='caption', props=[('text-align', 'center'), ('font-size', '160%'), ('color', '#135EA9'), ('background-color' , '#F6E7DB')])]
kba_data_df.head(5).style.set_caption("Top 5 rows of 'kba_data_df' DataFrame").set_table_styles(styles)
Common_Name | Date | Time | n_observers | County | Season | Time_Hour | AM/PM | DEM | |
---|---|---|---|---|---|---|---|---|---|
0 | Asian Koel | 2015-07-16 00:00:00 | 16:30 | 2 | Alappuzha | Wet | 16 | PM | 5.000000 |
1 | Black-rumped Flameback | 2015-07-16 00:00:00 | 16:30 | 2 | Alappuzha | Wet | 16 | PM | 5.000000 |
2 | Black Drongo | 2015-07-16 00:00:00 | 16:30 | 2 | Alappuzha | Wet | 16 | PM | 5.000000 |
3 | Brahminy Kite | 2015-07-16 00:00:00 | 16:30 | 2 | Alappuzha | Wet | 16 | PM | 5.000000 |
4 | Common Myna | 2015-07-16 00:00:00 | 16:30 | 2 | Alappuzha | Wet | 16 | PM | 5.000000 |
print(f'Number of distinct birds observed : {text_co.blue}{kba_data_df.Common_Name.nunique()}\n')
temp_df = pd.DataFrame(kba_data_df.Common_Name.value_counts())
temp_df.insert(0, 'Bird_Name', temp_df.index)
temp_df.index = np.arange(len(temp_df.index)) + 1
temp_df.rename(columns={'Common_Name': 'Count'}, inplace=True)
temp_df.head(20).style.background_gradient(cmap = 'Blues').set_caption('Most visited 20 birds with visit count').set_table_styles(styles)
Number of distinct birds observed : 492
Bird_Name | Count | |
---|---|---|
1 | White-cheeked Barbet | 13860 |
2 | House Crow | 12381 |
3 | Large-billed Crow | 9641 |
4 | Common Myna | 9147 |
5 | Rufous Treepie | 8655 |
6 | Greater Coucal | 8265 |
7 | White-throated Kingfisher | 8017 |
8 | Purple-rumped Sunbird | 8003 |
9 | Common Tailorbird | 7612 |
10 | Red-whiskered Bulbul | 7567 |
11 | Greater Racket-tailed Drongo | 6456 |
12 | Indian Pond-Heron | 5962 |
13 | Oriental Magpie-Robin | 5872 |
14 | Black Drongo | 5526 |
15 | Jungle Babbler | 5330 |
16 | Black-rumped Flameback | 5026 |
17 | Pale-billed Flowerpecker | 4969 |
18 | Black-hooded Oriole | 4813 |
19 | Asian Koel | 4708 |
20 | Rock Pigeon (Feral) | 4554 |
bird_visit_df = pd.DataFrame(temp_df.Count.value_counts(ascending=False)).reset_index().rename(columns = {'index' : 'No_of_visit', 'Count' : 'Bird_Count'}).sort_values('No_of_visit')
for i in range(5):
print(f'Number of birds that only visited {text_co.blue}{bird_visit_df.loc[i][0]}{text_co.reset} time during the entire survey {text_co.red}:{text_co.reset} {text_co.blue}{bird_visit_df.loc[i][1]}{text_co.reset}')
Number of birds that only visited 1 time during the entire survey : 31 Number of birds that only visited 2 time during the entire survey : 23 Number of birds that only visited 4 time during the entire survey : 21 Number of birds that only visited 3 time during the entire survey : 20 Number of birds that only visited 5 time during the entire survey : 14
plt.figure(figsize=(20,8))
sns.lineplot(x = bird_visit_df[bird_visit_df.No_of_visit < 200].No_of_visit, y = bird_visit_df[bird_visit_df.No_of_visit < 200].Bird_Count)
plt.suptitle('Number of Birds vs Visit chart', fontsize = 30, c = '#1D32E2')
plt.title('Where no. of visit by a single bird is < 200', fontsize = 15, c = '#C79438')
plt.xlabel('Number of visits')
plt.ylabel('Number of birds')
plt.show()
Q. Why we are exactly limited the graph to < 200
??
The curve is became almost flat after 200. This indicates that if a bird visited more than 200 times, then the bird count is 1 or 2 and for higher visits it becomes simply 1
Categorization Criteria
max_records = temp_df.Count.values[0]
very_rare = int(0.1 / 100 * max_records)
rare = int(1 / 100 * max_records)
common = int(10 / 100 * max_records)
very_common = int(50 / 100 * max_records)
print(f' Maximum recorded count for {text_co.blue}1{text_co.reset} bird {text_co.red}:{text_co.reset} {text_co.light_blue}{max_records}{text_co.reset}\n')
category = ['Very Rare', 'Rare', 'Common', 'Very Common', 'Most Abundant']
criterion_text = ['0.1% of max records', '0.1-1% of max records', '1-10% of max records', '10-50% of max records', '> 50% of max records']
cr_very_rare = temp_df.Count.values <= very_rare
cr_rare = (temp_df.Count.values > very_rare) & (temp_df.Count.values <= rare)
cr_common = (temp_df.Count.values > rare) & (temp_df.Count.values <= common)
cr_very_common = (temp_df.Count.values > common) & (temp_df.Count.values <= very_common)
cr_most_abundant = temp_df.Count.values > very_common
criterion_list = [cr_very_rare, cr_rare, cr_common, cr_very_common, cr_most_abundant]
species_count, bird_name_cr, cont_percentage, total_records = [], [], [], []
for criterion in criterion_list:
species_count.append(np.count_nonzero(criterion))
bird_name_cr.append(temp_df[criterion].Bird_Name.values)
for count, name in zip(species_count, bird_name_cr):
sum_ = 0
for i in range(count):
count = len(kba_data_df[kba_data_df.Common_Name == name[i]])
sum_ += count
total_records.append(sum_)
cont_percentage.append(round(sum_ * 100 / len(kba_data_df), 2))
birds_distribution = pd.DataFrame({ 'Category' : category, 'Classification Criterion' : criterion_text,
'Birds Count' : species_count, 'Total Records' : total_records,
'% Contribution to KBA Dataset' : cont_percentage})
# https://datascientyst.com/set-caption-customize-font-size-color-in-pandas-dataframe/
df_caption = 'Categorizing birds based on the occurance in record data'
styles = [dict(selector='caption', props=[('text-align', 'center'), ('font-size', '160%'), ('color', '#135EA9'), ('background-color' , '#F6E7DB')])]
birds_distribution.style.set_caption(df_caption).set_table_styles(styles).highlight_max(subset = ['Birds Count', 'Total Records', '% Contribution to KBA Dataset'], color = '#DBEAF6')
Maximum recorded count for 1 bird : 13860
Category | Classification Criterion | Birds Count | Total Records | % Contribution to KBA Dataset | |
---|---|---|---|---|---|
0 | Very Rare | 0.1% of max records | 167 | 796 | 0.260000 |
1 | Rare | 0.1-1% of max records | 148 | 8056 | 2.680000 |
2 | Common | 1-10% of max records | 123 | 62642 | 20.820000 |
3 | Very Common | 10-50% of max records | 44 | 136240 | 45.280000 |
4 | Most Abundant | > 50% of max records | 10 | 93148 | 30.960000 |
# Let's join 'kba_data_df' and 'kba_species_df' on 'Common_Name' for further analysis
kba_species_set = set(kba_species_df.Common_Name.unique())
kba_data_set = set(kba_data_df.Common_Name.unique())
print(f"All the '{text_co.blue}kba_data_set{text_co.reset}' birds present in '{text_co.blue}kba_species_set{text_co.reset}' {text_co.red}? {text_co.blue}{len(kba_data_set) == len(kba_species_set)}{text_co.reset}\n")
print(f'What is the diffence in bird count on these 2 data sets {text_co.red}? {text_co.red}:{text_co.reset} {text_co.blue}{len(kba_data_set) - len(kba_species_set)}{text_co.reset}\n')
kba_species_data = kba_data_df.merge(kba_species_df, on = 'Common_Name')
print(f'Shape of new joined data frame {text_co.red}:{text_co.reset} {text_co.blue}{kba_species_data.shape}{text_co.reset}\n')
print(f'Current number of unique birds in joined dataframe {text_co.red}:{text_co.reset} {text_co.blue}{kba_species_data.Common_Name.nunique()}')
All the 'kba_data_set' birds present in 'kba_species_set' ? False What is the diffence in bird count on these 2 data sets ? : 131 Shape of new joined data frame : (291543, 17) Current number of unique birds in joined dataframe : 361
# https://stackoverflow.com/a/41924823
district_wise_count = pd.DataFrame(kba_species_data.County.value_counts()).reset_index().rename(columns = {'index' : 'District_Name', 'County' : 'Bird_Count'}).sort_values('Bird_Count', ascending = True).reset_index(drop=True)
median_count = int(district_wise_count.Bird_Count.median())
plt.figure(figsize=(20,8))
sns.lineplot(x = district_wise_count.District_Name, y= district_wise_count.Bird_Count)
plt.suptitle('Bird Count vs Districts', fontsize = 30, c = '#1D32E2')
plt.title('Total number of birds counted from each district of Kerala', fontsize = 15, c = '#C79438')
plt.xlabel('District Names of Kerala')
plt.ylabel('Number of birds counted')
plt.xticks(rotation = 20)
plt.axhline(median_count, c = 'g', linestyle = '-.', linewidth = 1.5)
plt.text('Kottayam', median_count , f'Median count :: {median_count}', fontsize = 14, va = 'center', ha = 'center', backgroundcolor = 'w', c = 'r')
plt.savefig('images/bird_vs_district.svg', bbox_inches = 'tight', pad_inches = 0.3)
plt.show()
number_birds = []
resident_list = kba_species_df.Resident_status.unique()
for resident in resident_list:
number_birds.append(len(kba_species_df[kba_species_df.Resident_status == resident]))
pd.DataFrame({'Resident Status of Birds' : resident_list, 'Number of Birds' : number_birds, '% Contribution to KBA Dataset' : (number_birds / np.sum(number_birds) * 100)}).style.set_caption('Categorizing birds based on their Resident Status').set_table_styles(styles).highlight_max(subset = ['Number of Birds', '% Contribution to KBA Dataset'], color = '#DBEAF6')
Resident Status of Birds | Number of Birds | % Contribution to KBA Dataset | |
---|---|---|---|
0 | Resident | 249 | 68.975069 |
1 | WinterMigrant | 111 | 30.747922 |
2 | SummerMigrant | 1 | 0.277008 |
kba_species_df[kba_species_df.Resident_status == 'SummerMigrant'][['Common_Name', 'Scientific_Name', 'Resident_status', 'IUCN_Redlist_Status']].style.set_caption("The only 1 'SummerMigrant' in this record").set_table_styles(styles)
Common_Name | Scientific_Name | Resident_status | IUCN_Redlist_Status | |
---|---|---|---|---|
311 | Blue-cheeked Bee-eater | Merops persicus | SummerMigrant | Least Concern |
kba_species_data['Resident_count'] = np.ones(len(kba_species_data))
Name_Resident_Count = pd.DataFrame(kba_species_data.groupby(['Common_Name', 'Resident_status'])['Resident_count'].sum()).reset_index().sort_values('Resident_count', ascending = False).reset_index(drop=True)
temp_df = pd.DataFrame(columns = ['Common_Name', 'Resident_status', 'Resident_count'])
for resident in resident_list:
temp_df = pd.concat([temp_df, Name_Resident_Count[Name_Resident_Count.Resident_status == resident].head()])
temp_df.Resident_count = temp_df.Resident_count.astype('int')
temp_df = temp_df.reset_index(drop = True)
idx = pd.IndexSlice
temp_df.style.set_caption('Top 5 Birds from each Resident category').set_table_styles(styles).set_properties(**{'background-color': '#E5F3FA'}, subset = idx[idx[:4]]).set_properties(**{'background-color': '#FEEDFF'}, subset = idx[idx[5:9]]).set_properties(**{'background-color': '#E0ECE4'}, subset = idx[idx[10]])
Common_Name | Resident_status | Resident_count | |
---|---|---|---|
0 | White-cheeked Barbet | Resident | 13860 |
1 | House Crow | Resident | 12381 |
2 | Large-billed Crow | Resident | 9641 |
3 | Common Myna | Resident | 9147 |
4 | Rufous Treepie | Resident | 8655 |
5 | Blyth's Reed Warbler | WinterMigrant | 2753 |
6 | Indian Golden Oriole | WinterMigrant | 2387 |
7 | Blue-tailed Bee-eater | WinterMigrant | 1172 |
8 | Barn Swallow | WinterMigrant | 1108 |
9 | Ashy Drongo | WinterMigrant | 885 |
10 | Blue-cheeked Bee-eater | SummerMigrant | 5 |
print(f'Number of unique bird families present : {text_co.blue}{kba_species_df.Family.nunique()}{text_co.reset}')
print('=' * 43)
# Total Birds visit
season_count = kba_data_df.Season.value_counts()
for seas, count in zip(season_count.index, season_count.values):
print(f'Number of total Birds observed during {text_co.blue}{seas} season{text_co.reset} is {text_co.blue}:{text_co.reset} {text_co.blue}{count}{text_co.reset}')
print('=' * 60)
# Unique Birds visit
for season in kba_data_df.Season.unique():
bird_group_season = kba_data_df.groupby('Season').get_group(season).Common_Name
unique_count = bird_group_season.nunique()
print(f'Number of unique Birds visited during the {text_co.blue}{season} season{text_co.reset} is {text_co.blue}:{text_co.reset} {text_co.blue}{unique_count}{text_co.reset}')
Number of unique bird families present : 76 =========================================== Number of total Birds observed during Dry season is : 169704 Number of total Birds observed during Wet season is : 131178 ============================================================ Number of unique Birds visited during the Wet season is : 404 Number of unique Birds visited during the Dry season is : 475
# Wet Seaason DataFrame
wet_season_bird_df = pd.DataFrame(kba_data_df.groupby('Season').get_group('Wet').Common_Name.value_counts()).reset_index().rename(columns = {'index' : 'Bird_Name', 'Common_Name' : 'Wet_Visit_Count_Count'})
wet_season_bird_df = wet_season_bird_df[wet_season_bird_df.Wet_Visit_Count_Count != 0].sort_values('Wet_Visit_Count_Count', ascending = False)
# Dry Seaason DataFrame
dry_season_bird_df = pd.DataFrame(kba_data_df.groupby('Season').get_group('Dry').Common_Name.value_counts()).reset_index().rename(columns = {'index' : 'Bird_Name', 'Common_Name' : 'Dry_Visit_Count_Count'})
dry_season_bird_df = dry_season_bird_df[dry_season_bird_df.Dry_Visit_Count_Count != 0].sort_values('Dry_Visit_Count_Count', ascending = False)
wet_season_bird_count = set(wet_season_bird_df.Bird_Name.values)
dry_season_bird_count = set(dry_season_bird_df.Bird_Name.values)
print(f'Number of Birds who visited in {text_co.red}both{text_co.reset} {text_co.blue}Dry{text_co.reset} and {text_co.blue}Wet Season{text_co.blue} is {text_co.red}:{text_co.reset} {text_co.blue}{len(wet_season_bird_count.intersection(dry_season_bird_count))}{text_co.reset}')
print(f'Number of Birds who visited {text_co.red}only{text_co.reset} in {text_co.blue}Dry Season{text_co.reset} is {text_co.red}:{text_co.blue} {text_co.blue}{len(dry_season_bird_count.difference(wet_season_bird_count))}{text_co.reset}')
print(f'Number of Birds who visited {text_co.red}only{text_co.reset} in {text_co.blue}Wet Season{text_co.reset} is {text_co.red}:{text_co.blue} {text_co.blue}{len(wet_season_bird_count.difference(dry_season_bird_count))}{text_co.reset}')
Number of Birds who visited in both Dry and Wet Season is : 387 Number of Birds who visited only in Dry Season is : 88 Number of Birds who visited only in Wet Season is : 17
wet_season_bird_df.head(7).style.set_caption('Top 7 Birds visited during Wet Season').set_table_styles(styles)
Bird_Name | Wet_Visit_Count_Count | |
---|---|---|
0 | House Crow | 6297 |
1 | White-cheeked Barbet | 5281 |
2 | Large-billed Crow | 4656 |
3 | Common Myna | 4569 |
4 | Rufous Treepie | 4480 |
5 | Greater Coucal | 4344 |
6 | Purple-rumped Sunbird | 4220 |
dry_season_bird_df.head(7).style.set_caption('Top 7 Birds visited during Dry Season').set_table_styles(styles)
Bird_Name | Dry_Visit_Count_Count | |
---|---|---|
0 | White-cheeked Barbet | 8579 |
1 | House Crow | 6084 |
2 | Large-billed Crow | 4985 |
3 | Common Myna | 4578 |
4 | Red-whiskered Bulbul | 4236 |
5 | Rufous Treepie | 4175 |
6 | Greater Coucal | 3921 |
# Extract Year and Month for further analysis
# https://datagy.io/pandas-extract-date-from-datetime/
kba_data_df['Year'] = kba_data_df.Date.dt.year
kba_data_df['Month'] = kba_data_df.Date.dt.month_name()
#Lets plot some figure to get intutions
plt.figure(figsize=(15,8))
plt.suptitle('Number of Birds visited per Year chart', fontsize = 30, c = '#1D32E2')
plt.title('On each season', fontsize = 15, c = '#C79438')
plt.xlabel('Year of observation')
plt.ylabel('Number of birds visited')
sns.histplot(data = kba_data_df, x = 'Year', hue = 'Season', multiple = 'stack', palette = ['red', 'green'])
plt.show()
grouped = kba_data_df.groupby('Year')
img_index = 1
plt.figure(figsize=(20,20))
plt.suptitle('Monthly season wise Birds count distribution', fontsize = 30, c = '#1D32E2')
for name, group in grouped:
plt.subplot(2, 3, img_index)
plt.title(f' Year : {name}')
sns.histplot(data = group, x = 'Month', hue = 'Season', multiple = 'stack')
plt.xticks(rotation = 15)
img_index += 1
plt.show()
# https://pandas.pydata.org/docs/reference/api/pandas.DatetimeIndex.weekday.html
# The day of the week with Monday=0, Sunday=6.
bird_watch_weekEnd = (kba_data_df.Date.dt.weekday >=5).value_counts()[0]
percent_weekend = int(bird_watch_weekEnd*100/len(kba_data_df))
print(f'Around {text_co.red}{percent_weekend}%{text_co.reset} of whole the survey were done during week-ends.')
Around 55% of whole the survey were done during week-ends.
percnt_over_year = pd.DataFrame(columns = ['Year', 'Wet_Bird_Count', 'Dry_Bird_Count'], index=range(6))
for se, co in zip(['Dry', 'Wet'], ['Dry_Bird_Count', 'Wet_Bird_Count']):
if se == 'Dry': a = 1
else : a = 0
for na, gr in kba_data_df[kba_data_df.Season == se].groupby('Year'):
percnt_over_year.loc[a].Year = na
percnt_over_year.loc[a][co] = gr.shape[0]
a += 1
percnt_over_year.fillna(0, inplace = True)
percnt_over_year.insert(2, 'Wet_Cum_Percentage', percnt_over_year.Wet_Bird_Count.cumsum() / percnt_over_year.Wet_Bird_Count.sum())
percnt_over_year.insert(4, 'Dry_Cum_Percentage', percnt_over_year.Dry_Bird_Count.cumsum() / percnt_over_year.Dry_Bird_Count.sum())
# https://stackoverflow.com/a/41924823
plt.figure(figsize=(14, 8))
plt.suptitle('Percentage Bird watch completed in each year', fontsize = 30, c = '#1D32E2')
plt.title('(Cumulative Percentage)', fontsize = 15, c = '#C79438')
plt.xlabel('Year of observation')
plt.ylabel('Percentage completed')
sns.lineplot(data = percnt_over_year, y = 'Wet_Cum_Percentage', x = 'Year', color = 'red', label = 'Wet Season')
sns.lineplot(data = percnt_over_year, y = 'Dry_Cum_Percentage', x = 'Year', color = 'blue', label = 'Dry Season')
plt.axhline(0.92, linewidth=1, linestyle = '-.', c = 'g')
plt.axhline(0.73, linewidth=1, linestyle = '-.', c = 'g')
plt.text(2016, 0.92, '92% Cumulative', fontsize = 10, va = 'center', ha = 'center', backgroundcolor = 'w', c = 'g')
plt.text(2017, 0.73, '73% Cumulative', fontsize = 10, va = 'center', ha = 'center', backgroundcolor = 'w', c = 'g')
plt.legend(loc = 5)
plt.show()
We do have a time data in-hand. Let's check what is the time the data collectors used mostly.
def df_to_hour(df):
# return pd.to_datetime(df.Time).dt.hour
return df.Time_Hour
all_season = df_to_hour(kba_data_df)
wet_season_hr= df_to_hour(kba_data_df[kba_data_df['Season'] == 'Wet'])
dry_season_hr= df_to_hour(kba_data_df[kba_data_df['Season'] == 'Dry'])
season_time = [(all_season, 'All Season'), (wet_season_hr, 'Wet Season'), (dry_season_hr, 'Dry Season')]
plt.figure(figsize=(20, 8))
img_index = 1
for data, sea in season_time:
plt.suptitle('Bird Watch Timings', fontsize = 30, c = '#1D32E2')
plt.subplot(1, 3, img_index)
plt.title(sea)
sns.histplot(data = data, bins = 24)
img_index += 1
plt.show()
endemicity_ = pd.DataFrame(kba_species_df.Endemicity.value_counts()).reset_index().rename(columns = {'Endemicity' : 'Count', 'index' : 'label'})
fig = plt.figure()
ax = fig.add_axes([0,0,1.25,1.25])
ax.axis('equal')
plt.pie(endemicity_.Count, labels = endemicity_.label, autopct = '%1.2f%%', colors = ['#2ca12d', '#f77e12'])
plt.title("Bird's Endemicity", fontsize = 30, c = '#1D32E2')
plt.show()
def processDF(df, endimic, count_name):
columns_ = ['Common_Name', 'Resident_status', 'IUCN_Redlist_Status', 'Family']
df = df[df.Endemicity == endimic][columns_].reset_index(drop = True)
iucn_df = pd.DataFrame(df.IUCN_Redlist_Status.value_counts()).reset_index().rename(columns = {'index' : 'IUCN_Status', 'IUCN_Redlist_Status' : count_name})
return df, iucn_df
western_birds, western_iucn = processDF(kba_species_df, 'Western Ghats', 'Western_G')
_, non_western_iucn = processDF(kba_species_df, 'Not endemic', 'non_Western_G')
print(f'\nNumber of different species families in {text_co.blue}Western Ghats {text_co.reset} : {text_co.red}{western_birds.Family.nunique()}{text_co.reset}\n')
western_birds.sort_values('IUCN_Redlist_Status').reset_index(drop = True).style.set_caption('All Birds from Western Ghats').set_table_styles(styles).set_properties(**{'background-color': '#FEEDFF'}, subset = idx[idx[:1]]).set_properties(**{'background-color': '#E5F3FA'}, subset = idx[idx[2:25]]).set_properties(**{'background-color': '#FBFDEE'}, subset = idx[idx[26:27]]).set_properties(**{'background-color': '#E0ECE4'}, subset = idx[idx[28:]])
Number of different species families in Western Ghats : 21
Common_Name | Resident_status | IUCN_Redlist_Status | Family | |
---|---|---|---|---|
0 | Banasura Laughingthrush | Resident | Endangered | Leiothrichidae |
1 | Nilgiri Laughingthrush | Resident | Endangered | Leiothrichidae |
2 | White-bellied Treepie | Resident | Least Concern | Corvidae |
3 | Black-and-orange Flycatcher | Resident | Least Concern | Muscicapidae |
4 | Square-tailed Bulbul | Resident | Least Concern | Pycnonotidae |
5 | Crimson-backed Sunbird | Resident | Least Concern | Nectariniidae |
6 | Wayanad Laughingthrush | Resident | Least Concern | Leiothrichidae |
7 | Grey-fronted Green-Pigeon | Resident | Least Concern | Columbidae |
8 | Orange Minivet | Resident | Least Concern | Campephagidae |
9 | Legge's Hawk-Eagle | Resident | Least Concern | Accipitridae |
10 | Indian Swiftlet | Resident | Least Concern | Apodidae |
11 | Malabar Barbet | Resident | Least Concern | Megalaimidae |
12 | Malabar Grey Hornbill | Resident | Least Concern | Bucerotidae |
13 | Rufous Babbler | Resident | Least Concern | Leiothrichidae |
14 | Malabar Parakeet | Resident | Least Concern | Psittaculidae |
15 | Malabar Lark | Resident | Least Concern | Alaudidae |
16 | Malabar Woodshrike | Resident | Least Concern | Vangidae |
17 | White-bellied Blue Flycatcher | Resident | Least Concern | Muscicapidae |
18 | Dark-fronted Babbler | Resident | Least Concern | Timaliidae |
19 | Hill Swallow | Resident | Least Concern | Hirundinidae |
20 | Malabar Starling | Resident | Least Concern | Sturnidae |
21 | Flame-throated Bulbul | Resident | Least Concern | Pycnonotidae |
22 | Nilgiri Thrush | Resident | Least Concern | Turdidae |
23 | Nilgiri Flycatcher | Resident | Least Concern | Muscicapidae |
24 | Nilgiri Flowerpecker | Resident | Least Concern | Dicaeidae |
25 | Yellow-browed Bulbul | Resident | Least Concern | Pycnonotidae |
26 | Grey-headed Bulbul | Resident | Near Threatened | Pycnonotidae |
27 | Palani Laughingthrush | Resident | Near Threatened | Leiothrichidae |
28 | Nilgiri Pipit | Resident | Vulnerable | Motacillidae |
29 | Broad-tailed Grassbird | Resident | Vulnerable | Locustellidae |
30 | White-bellied Sholakili | Resident | Vulnerable | Muscicapidae |
31 | Ashambu Laughingthrush | Resident | Vulnerable | Leiothrichidae |
32 | Nilgiri Wood-Pigeon | Resident | Vulnerable | Columbidae |
western_birds_only = pd.merge(kba_data_df[['Common_Name', 'Season', 'County']], western_birds, on = 'Common_Name', how = 'inner')
pd.DataFrame(western_birds_only.Common_Name.value_counts()).reset_index().rename(columns = {'Common_Name' : 'Visit_Count_During_Survey', 'index' : 'Common_Name'}).head(6).style.set_caption('Endemic Species with 1000+ Visit').set_table_styles(styles)
Common_Name | Visit_Count_During_Survey | |
---|---|---|
0 | Yellow-browed Bulbul | 2687 |
1 | Nilgiri Flowerpecker | 2288 |
2 | Orange Minivet | 2286 |
3 | Crimson-backed Sunbird | 1781 |
4 | Malabar Parakeet | 1443 |
5 | Malabar Grey Hornbill | 1346 |
iucn_status = pd.merge(western_iucn, non_western_iucn, how = 'outer').fillna(0)
iucn_status['Western_G'] = iucn_status['Western_G'].apply(np.int64)
iucn_status.insert(2, '% Western_G', iucn_status['Western_G'].apply(lambda x : round(x *100 / sum(iucn_status.Western_G), 2)))
iucn_status.insert(4, '% Non_Western_G', iucn_status['non_Western_G'].apply(lambda x : round(x *100 / sum(iucn_status.non_Western_G), 2)))
iucn_status.style.set_caption('Endemic Species IUCN Status Count').set_table_styles(styles).set_table_styles(styles).highlight_max(subset = ['Western_G', '% Western_G', 'non_Western_G', '% Non_Western_G'], color = '#DBEAF6')
IUCN_Status | Western_G | % Western_G | non_Western_G | % Non_Western_G | |
---|---|---|---|---|---|
0 | Least Concern | 24 | 72.730000 | 303 | 92.380000 |
1 | Vulnerable | 5 | 15.150000 | 6 | 1.830000 |
2 | Near Threatened | 2 | 6.060000 | 16 | 4.880000 |
3 | Endangered | 2 | 6.060000 | 1 | 0.300000 |
4 | Critically Endangered | 0 | 0.000000 | 2 | 0.610000 |
district_wise_count = pd.DataFrame(western_birds_only.County.value_counts()).reset_index().rename(columns = {'index' : 'District_Name', 'County' : 'Bird_Count'}).sort_values('Bird_Count', ascending = True).reset_index(drop=True)
median_count = int(district_wise_count.Bird_Count.median())
plt.figure(figsize=(20,8))
sns.lineplot(x = district_wise_count.District_Name, y= district_wise_count.Bird_Count)
plt.suptitle('Western Ghats Bird Count vs Districts', fontsize = 30, c = '#1D32E2')
plt.title('Total number of birds counted from each district of Kerala', fontsize = 15, c = '#C79438')
plt.xlabel('District Names of Kerala')
plt.ylabel('Number of birds counted')
plt.xticks(rotation = 20)
plt.axhline(median_count, c = 'g', linestyle = '-.', linewidth = 1.5)
plt.text('Kottayam', median_count , f'Median count :: {median_count}', fontsize = 14, va = 'center', ha = 'center', backgroundcolor = 'w', c = 'r')
plt.show()