Kerala Bird Atlas (KBA) -EDA
¶

The Kerala Bird Atlas (KBA), the first-of-its-kind state-level bird atlas in India, has created solid baseline data about the distribution and abundance of various bird species across all major habitats giving an impetus for futuristic studies.

The entire state of Kerala was systematically surveyed twice a year during 2015–20. It is arguably Asia’s largest bird atlas in terms of geographical extent, sampling effort and species coverage derived from the aggregation of 25,000 checklists.

KBA accounted for nearly three lakh records of 361 species, including 94 very rare species, 103 rare species, 110 common species, 44 very common species, and 10 most abundant species.

Objective¶

Citizen-science driven exercises (e.g. bird surveys) and online platforms (e.g. eBird) provide voluminous data on bird occurrence. However, the semi-structured nature of their data collection makes it difficult to compare bird distribution across space and time.

Data on the distribution of species and the factors governing the same are prerequisites for effective and efficient conservation efforts. Such information is necessary to inform the selection of protected areas, to assess habitat associations and to predict the likely effects of future en-vironmental changes.

Methodology¶

Spatial extent¶

Kerala lies between 8°18'N and 12°48'N lat. and 74°52'E and 77°22'E long. in southwestern India. Wedged between the Arabian Sea and the windward side of the Western Ghats, it receives abundant rainfall (180–360 cm) and experiences a tropical climate. Elevation in the region varies from –2.2 m (Kuttanad) to 2695 m (Anamudi peak). It is spread across an area of 38,863 sq. km, of which 27% is under forest cover, 66% is under cultivation and 7% constitutes built-up areas/wetlands/uncultivated land. Onefourth of the Western Ghats range falls within Kerala. Surveys for KBA were conducted in all 14 administrative units (districts) of Kerala.

Sampling protocol¶

Kerala was divided into cells of size 3.75 min x 3.75 min (equivalent to 6.6 km x 6.6 km) aligned to Survey of India maps. A total of 915 cells were laid out covering the entire state. Each cell was further divided into four quadrants of size 3.3 km x 3.3 km. Each quadrant was then sub-divided into 9 sub-cells of size 1.1 x 1.1 km. A single, randomly selected sub-cell in every quadrant was chosen for the survey. Grids were laid and the randomly selected sub-cells were marked on the map prior to the survey. A total of 63 sub-cells were found to be located in inaccessible cliffs or valleys, and these were replaced by adjacent accessible sub-cells with the same habitat type from the same quadrants.

Dataset¶

This observation dataset contains 3 files :

kba_data.rds
kba_names.rds
kba_species.details.rds

Reference : Bird Count India, The Hindu, Times of India, Mysore Bird Atlas

# How to deal RDS files inside python 🐍 ?

RDS Format is a file format for storing R objects. Our data can be saved in RDS format. The .rds file extension is most often used in R.

Saving data into R data formats can reduce the size of large files by considerably.

💡 We can process .rds files using pyreadr package.

We can install it easily with pip: pip install pyreadr

!pip install pyreadr

import pyreadr

result = pyreadr.read_r('../path_to_file.rds') # also works for RData
print(f"Class type of 'result' \t:: {type(result)}")

df = result[None] # extract the pandas data frame
print(f"Class type of 'df' \t:: {type(df)}")

>>> Class type of 'result' 	:: <class 'collections.OrderedDict'>
>>> Class type of 'df' 	    :: <class 'pandas.core.frame.DataFrame'>

Reference : StackoverFlow, GitHub

More about Dataset¶

Date : The field surveys were conducted from 2015 to 2020.
Season : The field surveys were conducted twice a year ie, during
- Dry (mid-January to mid-March) season
  - This season coincides with the peak activity of migratory species.
- Wet (mid-July to mid-September) season
  - This season (monsoon) coincides with the breeding period of many resident species.
n.observers aka Number of observers
- The number of observers or volunteers for atlas surveys.
- Though the protocol mentioned 2–5 volunteers for each atlas surveys groups, but there are occations more volunteers involved in the survey.
- So the number of observers ranges from 2-13.
We calculated the endemic score of every cell based on the number of endemic species reported from it.
- The endemic score is the sum of scores/total species count per unit checklist
  - 1 : The species restricted to Western Ghats–Sri Lanka biodiversity hotspot
  - 0 : The rest (non-endemic species).
We calculated the threat and SoIB (State of India’s Bird) score for every cell. SoIB utilized the eBird data to estimate indices of population trends and range size for 867 of India’s 1333 bird species.
- 2 : High
- 1 : Moderate
- 0 : Low
Threat categories were based on the IUCN Red List 32 and were scored as follows:
- 4 : Critically endangered
- 3 : Endangered
- 2 : Vulnerable
- 1 : Near-threatened
- 0 : Least-concern

NOTE :

There was the possibility of passage migrants (e.g. Eurasian Cuckoo, Amur Falcon) crossing through Kerala for a very short duration (a couple of weeks) during the intervening un-surveyed months. The atlas survey did not focus on such passage migrants.
The threat score and SoIB score were mapped separately for wet and dry seasons.

In [1]:

# !pip install pyreadr # https://github.com/ofajardo/pyreadr#installation
import pyreadr

import re
import numpy as np
import pandas as pd

from os import listdir

from operator import itemgetter

import seaborn as sns
import matplotlib.pyplot as plt

from tqdm import tqdm # For progress bar

# Suppress warnings 
import warnings
warnings.filterwarnings('ignore')

# Settings for pretty nice plots
# https://matplotlib.org/3.5.1/gallery/style_sheets/style_sheets_reference.html
plt.style.use('fivethirtyeight')
plt.show()


# For highlighting text on 'print' command
class text_co:
    reset = '\u001b[0m'
    blue = '\u001b[34;1m'
    red = '\u001b[31;1m'
    light_blue = '\u001b[34m'

Reading the Datasets¶

In [2]:

# List of files available

print(listdir('../Kerala_Birds/data_files'))

['kba_species.details.rds', 'kba_names.rds', 'kba_data.rds']

📌 Note : Here we are only considering .rds files.

In [3]:

# Like mentioned above we are using "pyreadr" for reading ".rds" files.
PATH = '../Kerala_Birds/data_files/'

kba_data = pyreadr.read_r(PATH + 'kba_data.rds')
kba_names = pyreadr.read_r(PATH + 'kba_names.rds')
kba_species = pyreadr.read_r(PATH + 'kba_species.details.rds')

kba_data_df = kba_data[None]
kba_names_df = kba_names[None]
kba_species_df = kba_species[None]

# Let's check and confirm all are Pandas dataframes

print(f"Class type of 'kba_data_df' is    : {type(kba_data_df)}")
print(f"Class type of 'kba_names_df' is   : {type(kba_names_df)}")
print(f"Class type of 'kba_species_df' is : {type(kba_species_df)}")

Class type of 'kba_data_df' is    : <class 'pandas.core.frame.DataFrame'>
Class type of 'kba_names_df' is   : <class 'pandas.core.frame.DataFrame'>
Class type of 'kba_species_df' is : <class 'pandas.core.frame.DataFrame'>

🥳 All are Pandas dataframes.

In [4]:

# Let's see aand check the size of our all 3 datasets before going to exploration

print(f"Size of 'kba_data_df'    : {text_co.blue}{kba_data_df.shape}\n")
kba_data_df.head()

Size of 'kba_data_df'    : (300882, 10)

Out[4]:

	Common.Name	Date	Time	n.observers	County	Sub.cell	Season	DEM	Cell.ID	List.ID
0	Asian Koel	7/16/2015	16:30	2.0	Alappuzha	[51,2,2]	Wet	5.0	[76.28,9.84]	List.1
1	Black-rumped Flameback	7/16/2015	16:30	2.0	Alappuzha	[51,2,2]	Wet	5.0	[76.28,9.84]	List.1
2	Black Drongo	7/16/2015	16:30	2.0	Alappuzha	[51,2,2]	Wet	5.0	[76.28,9.84]	List.1
3	Brahminy Kite	7/16/2015	16:30	2.0	Alappuzha	[51,2,2]	Wet	5.0	[76.28,9.84]	List.1
4	Common Myna	7/16/2015	16:30	2.0	Alappuzha	[51,2,2]	Wet	5.0	[76.28,9.84]	List.1

In [5]:

print(f"Size of 'kba_names_df'   : {text_co.blue}{kba_names_df.shape}\n")
kba_names_df.head()

Size of 'kba_names_df'   : (492, 4)

Out[5]:

	Common.Name	Scientific.Name	Assigned.Name.for.Atlas
0	Asian Koel	Eudynamys scolopaceus	Asian Koel
1	Black-rumped Flameback	Dinopium benghalense	Black-rumped Flameback
2	Black Drongo	Dicrurus macrocercus	Black Drongo
3	Brahminy Kite	Haliastur indus	Brahminy Kite
4	Common Myna	Acridotheres tristis	Common Myna

In [6]:

print(f"Size of 'kba_species_df' : {text_co.blue}{kba_species_df.shape}\n")
kba_species_df.head()

Size of 'kba_species_df' : (361, 9)

Out[6]:

	Common.Name	Scientific.Name	Resident.status	IUCN.Redlist.Status	Distribution.Status	SoIB.status	Order	Family	Endemicity
0	Zitting Cisticola	Cisticola juncidis	Resident	Least Concern	Very Large	Low	Passeriformes	Cisticolidae	Not endemic
1	Ashy Prinia	Prinia socialis	Resident	Least Concern	Very Large	Low	Passeriformes	Cisticolidae	Not endemic
2	Yellow-throated Bulbul	Pycnonotus xantholaemus	Resident	Vulnerable	Moderate	Moderate	Passeriformes	Pycnonotidae	Not endemic
3	Yellow-footed Green-Pigeon	Treron phoenicopterus	Resident	Least Concern	Very Large	Low	Columbiformes	Columbidae	Not endemic
4	Yellow-eyed Babbler	Chrysomma sinense	Resident	Least Concern	Very Large	Low	Passeriformes	Sylviidae	Not endemic

Missing Values¶

In [7]:

# Renaming column names for easiness. Replacing '.' with '_'

kba_data_df.columns = ['_'.join(column.split('.')) for column in kba_data_df.columns]
kba_names_df.columns = ['_'.join(column.split('.')) for column in kba_names_df.columns]
kba_species_df.columns = ['_'.join(column.split('.')) for column in kba_species_df.columns]

# Checking for missing Values
print('kba_data dataframe')
print('-' * 18)
print(kba_data_df.isnull().sum(), '\n')

print('kba_names dataframe')
print('-' * 19)
print(kba_names_df.isnull().sum(), '\n')

print('kba_species dataframe')
print('-' * 21)
print(kba_species_df.isnull().sum())

kba_data dataframe
------------------
Common_Name      0
Date             0
Time             0
n_observers      0
County           0
Sub_cell       127
Season           0
DEM              0
Cell_ID        127
List_ID        127
dtype: int64 

kba_names dataframe
-------------------
Common_Name                0
Scientific_Name            0
Action                     0
Assigned_Name_for_Atlas    0
dtype: int64 

kba_species dataframe
---------------------
Common_Name            0
Scientific_Name        0
Resident_status        0
IUCN_Redlist_Status    0
Distribution_Status    0
SoIB_status            0
Order                  0
Family                 0
Endemicity             0
dtype: int64

In [8]:

missing_val_columns = ['Sub_cell', 'Cell_ID', 'List_ID']

for column in missing_val_columns:
    missing_count = kba_data_df[column].isnull().sum()
    per_value = missing_count * 100 / kba_data_df.shape[0]
    print(f'{text_co.blue}{column}{text_co.reset} is having {text_co.red}{missing_count}{text_co.reset} missing values ie, {text_co.red}{per_value:.4f}%{text_co.reset} of total data')

kba_data_df.drop(missing_val_columns, 1, inplace = True)

print(f"\nShape of '{text_co.blue}kba_data_df{text_co.reset}' after dropping columns {text_co.blue}{kba_data_df.shape}")

Sub_cell is having 127 missing values ie, 0.0422% of total data
Cell_ID is having 127 missing values ie, 0.0422% of total data
List_ID is having 127 missing values ie, 0.0422% of total data

Shape of 'kba_data_df' after dropping columns (300882, 7)

Generating New Features¶

In [9]:

# Some names in 'kba_data_df' and 'kba_species_df' are different, so correcting it

kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray Heron', 'Grey Heron', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray Wagtail', 'Grey Wagtail', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray Francolin', 'Grey Francolin', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Dollarbird', 'Oriental Dollarbird', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray Junglefowl', 'Grey Junglefowl', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Rock Pigeon', 'Rock Pigeon (Feral)', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-headed Bulbul', 'Grey-headed Bulbul', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Large Gray Babbler', 'Large Grey Babbler', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-bellied Cuckoo', 'Grey-bellied Cuckoo', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-headed Lapwing', 'Grey-headed Lapwing', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-headed Swamphen', 'Grey-headed Swamphen', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-breasted Prinia', 'Grey-breasted Prinia', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Indian Gray Hornbill', 'Indian Grey Hornbill', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Malabar Gray Hornbill', 'Malabar Grey Hornbill', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-fronted Green-Pigeon', 'Grey-fronted Green-Pigeon', x))
kba_data_df['Common_Name'] = kba_data_df.Common_Name.apply(lambda x : re.sub(r'Gray-headed Canary-Flycatcher', 'Grey-headed Canary-Flycatcher', x))

In [10]:

#Convert 'Date' column from 'category' to 'datetime64[ns]'

kba_data_df['Date'] = pd.to_datetime(kba_data_df['Date']).astype('datetime64')

# Converting 'Time' column to integer for creating new feature

# https://www.geeksforgeeks.org/python-get-first-element-of-each-sublist/
# https://www.geeksforgeeks.org/python-converting-all-strings-in-list-to-integers/

# kba_data_df['Time_Hour'] = list(map(int, list(map(itemgetter(0), kba_data_df.Time.str.split(':')))))
kba_data_df.insert(6, 'Time_Hour', list(map(int, list(map(itemgetter(0), kba_data_df.Time.str.split(':'))))))
'''
map()'s making complicated ??? Instead of this we can use the bellow also. Dose the same job.

kba_data_df['Time2'] = list(map(itemgetter(0), kba_data_df.Time.str.split(':')))
kba_data_df['Time2'] = kba_data_df['Time2'].astype('int64')
'''
# kba_data_df['AM/PM'] = ['PM' if kba_data_df['Time2'][i]>12 else 'AM' for i in range(kba_data_df.shape[0])]

kba_data_df.insert(7, 'AM/PM', ['PM' if kba_data_df['Time_Hour'][i]>12 else 'AM' for i in range(kba_data_df.shape[0])]
)

# kba_data_df.drop('Time_Hour', 1, inplace = True)

kba_data_df['n_observers'] = pd.to_datetime(kba_data_df['n_observers']).astype('int64') # Human count is always int, so converting into int.

styles = [dict(selector='caption', props=[('text-align', 'center'), ('font-size', '160%'), ('color', '#135EA9'), ('background-color' , '#F6E7DB')])]

kba_data_df.head(5).style.set_caption("Top 5 rows of 'kba_data_df' DataFrame").set_table_styles(styles)

Out[10]:

Top 5 rows of 'kba_data_df' DataFrame
	Common_Name	Date	Time	n_observers	County	Season	Time_Hour	AM/PM	DEM
0	Asian Koel	2015-07-16 00:00:00	16:30	2	Alappuzha	Wet	16	PM	5.000000
1	Black-rumped Flameback	2015-07-16 00:00:00	16:30	2	Alappuzha	Wet	16	PM	5.000000
2	Black Drongo	2015-07-16 00:00:00	16:30	2	Alappuzha	Wet	16	PM	5.000000
3	Brahminy Kite	2015-07-16 00:00:00	16:30	2	Alappuzha	Wet	16	PM	5.000000
4	Common Myna	2015-07-16 00:00:00	16:30	2	Alappuzha	Wet	16	PM	5.000000

Number of Distinct Birds Observed¶

In [11]:

print(f'Number of distinct birds observed : {text_co.blue}{kba_data_df.Common_Name.nunique()}\n')

temp_df = pd.DataFrame(kba_data_df.Common_Name.value_counts())
temp_df.insert(0, 'Bird_Name', temp_df.index)
temp_df.index = np.arange(len(temp_df.index)) + 1
temp_df.rename(columns={'Common_Name': 'Count'}, inplace=True)

temp_df.head(20).style.background_gradient(cmap = 'Blues').set_caption('Most visited 20 birds with visit count').set_table_styles(styles)

Number of distinct birds observed : 492

Out[11]:

Most visited 20 birds with visit count
	Bird_Name	Count
1	White-cheeked Barbet	13860
2	House Crow	12381
3	Large-billed Crow	9641
4	Common Myna	9147
5	Rufous Treepie	8655
6	Greater Coucal	8265
7	White-throated Kingfisher	8017
8	Purple-rumped Sunbird	8003
9	Common Tailorbird	7612
10	Red-whiskered Bulbul	7567
11	Greater Racket-tailed Drongo	6456
12	Indian Pond-Heron	5962
13	Oriental Magpie-Robin	5872
14	Black Drongo	5526
15	Jungle Babbler	5330
16	Black-rumped Flameback	5026
17	Pale-billed Flowerpecker	4969
18	Black-hooded Oriole	4813
19	Asian Koel	4708
20	Rock Pigeon (Feral)	4554

In [12]:

bird_visit_df = pd.DataFrame(temp_df.Count.value_counts(ascending=False)).reset_index().rename(columns = {'index' : 'No_of_visit', 'Count' : 'Bird_Count'}).sort_values('No_of_visit')

for i in range(5):
    print(f'Number of birds that only visited {text_co.blue}{bird_visit_df.loc[i][0]}{text_co.reset} time during the entire survey {text_co.red}:{text_co.reset} {text_co.blue}{bird_visit_df.loc[i][1]}{text_co.reset}')

Number of birds that only visited 1 time during the entire survey : 31
Number of birds that only visited 2 time during the entire survey : 23
Number of birds that only visited 4 time during the entire survey : 21
Number of birds that only visited 3 time during the entire survey : 20
Number of birds that only visited 5 time during the entire survey : 14

In [13]:

plt.figure(figsize=(20,8))
sns.lineplot(x = bird_visit_df[bird_visit_df.No_of_visit < 200].No_of_visit, y = bird_visit_df[bird_visit_df.No_of_visit < 200].Bird_Count)
plt.suptitle('Number of Birds vs Visit chart', fontsize = 30, c = '#1D32E2')
plt.title('Where no. of visit by a single bird is < 200', fontsize = 15, c = '#C79438')
plt.xlabel('Number of visits')
plt.ylabel('Number of birds')
plt.show()

Q. Why we are exactly limited the graph to < 200 ??

The curve is became almost flat after 200. This indicates that if a bird visited more than 200 times, then the bird count is 1 or 2 and for higher visits it becomes simply 1

Observation

White-cheeked Barbet (which is a Resident one) visited for 13,860 times during the survey period and this is the most recorded bird.
White-cheeked Barbet and House Crow had over 10,000 records.
There are 31 birds, those had only single occurance in the database, means only one visit during the entire survey.

Categorizing Birds Based on the Occurance of Records (Both Dry and Wet Season Combined)¶

Categorization Criteria

Very Rare ::: 0.1% of max records
Rare ::: 0.1 - 1% of max records
Common ::: 1 - 10% of max records
Very Common ::: 10 - 50% of max records
Most Abundant ::: > 50% of max records

In [14]:

max_records = temp_df.Count.values[0]
very_rare = int(0.1 / 100 * max_records)
rare = int(1 / 100 * max_records)
common = int(10 / 100 * max_records)
very_common = int(50 / 100 * max_records)

print(f' Maximum recorded count for {text_co.blue}1{text_co.reset} bird {text_co.red}:{text_co.reset} {text_co.light_blue}{max_records}{text_co.reset}\n')

category = ['Very Rare', 'Rare', 'Common', 'Very Common', 'Most Abundant']
criterion_text = ['0.1% of max records', '0.1-1% of max records', '1-10% of max records', '10-50% of max records', '> 50% of max records']

cr_very_rare = temp_df.Count.values <= very_rare
cr_rare = (temp_df.Count.values > very_rare) & (temp_df.Count.values <= rare)
cr_common = (temp_df.Count.values > rare) & (temp_df.Count.values <= common)
cr_very_common = (temp_df.Count.values > common) & (temp_df.Count.values <= very_common)
cr_most_abundant = temp_df.Count.values > very_common

criterion_list = [cr_very_rare, cr_rare, cr_common, cr_very_common, cr_most_abundant]

species_count, bird_name_cr, cont_percentage, total_records = [], [], [], []

for criterion in criterion_list:
    species_count.append(np.count_nonzero(criterion))
    bird_name_cr.append(temp_df[criterion].Bird_Name.values)

for count, name in zip(species_count, bird_name_cr):
    sum_ = 0
    for i in range(count):
        count = len(kba_data_df[kba_data_df.Common_Name == name[i]])
        sum_ += count
    total_records.append(sum_)
    cont_percentage.append(round(sum_ * 100 / len(kba_data_df), 2))
    
birds_distribution = pd.DataFrame({ 'Category' : category, 'Classification Criterion' : criterion_text,
                                   'Birds Count' : species_count, 'Total Records' : total_records,
                                   '% Contribution to KBA Dataset' : cont_percentage})

# https://datascientyst.com/set-caption-customize-font-size-color-in-pandas-dataframe/

df_caption = 'Categorizing birds based on the occurance in record data'

styles = [dict(selector='caption', props=[('text-align', 'center'), ('font-size', '160%'), ('color', '#135EA9'), ('background-color' , '#F6E7DB')])]

birds_distribution.style.set_caption(df_caption).set_table_styles(styles).highlight_max(subset = ['Birds Count', 'Total Records', '% Contribution to KBA Dataset'], color = '#DBEAF6')

 Maximum recorded count for 1 bird : 13860

Out[14]:

Categorizing birds based on the occurance in record data
	Category	Classification Criterion	Birds Count	Total Records	% Contribution to KBA Dataset
0	Very Rare	0.1% of max records	167	796	0.260000
1	Rare	0.1-1% of max records	148	8056	2.680000
2	Common	1-10% of max records	123	62642	20.820000
3	Very Common	10-50% of max records	44	136240	45.280000
4	Most Abundant	> 50% of max records	10	93148	30.960000

In [15]:

# Let's join 'kba_data_df' and 'kba_species_df' on 'Common_Name' for further analysis

kba_species_set = set(kba_species_df.Common_Name.unique())
kba_data_set = set(kba_data_df.Common_Name.unique())

print(f"All the '{text_co.blue}kba_data_set{text_co.reset}' birds present in '{text_co.blue}kba_species_set{text_co.reset}' {text_co.red}? {text_co.blue}{len(kba_data_set) == len(kba_species_set)}{text_co.reset}\n")
print(f'What is the diffence in bird count on these 2 data sets {text_co.red}? {text_co.red}:{text_co.reset} {text_co.blue}{len(kba_data_set) - len(kba_species_set)}{text_co.reset}\n')

kba_species_data = kba_data_df.merge(kba_species_df, on = 'Common_Name')

print(f'Shape of new joined data frame {text_co.red}:{text_co.reset} {text_co.blue}{kba_species_data.shape}{text_co.reset}\n')

print(f'Current number of unique birds in joined dataframe {text_co.red}:{text_co.reset} {text_co.blue}{kba_species_data.Common_Name.nunique()}')

All the 'kba_data_set' birds present in 'kba_species_set' ? False

What is the diffence in bird count on these 2 data sets ? : 131

Shape of new joined data frame : (291543, 17)

Current number of unique birds in joined dataframe : 361

In [16]:

# https://stackoverflow.com/a/41924823

district_wise_count = pd.DataFrame(kba_species_data.County.value_counts()).reset_index().rename(columns = {'index' : 'District_Name', 'County' : 'Bird_Count'}).sort_values('Bird_Count', ascending = True).reset_index(drop=True)
median_count = int(district_wise_count.Bird_Count.median())

plt.figure(figsize=(20,8))
sns.lineplot(x = district_wise_count.District_Name, y= district_wise_count.Bird_Count)
plt.suptitle('Bird Count vs Districts', fontsize = 30, c = '#1D32E2')
plt.title('Total number of birds counted from each district of Kerala', fontsize = 15, c = '#C79438')
plt.xlabel('District Names of Kerala')
plt.ylabel('Number of birds counted')
plt.xticks(rotation = 20)
plt.axhline(median_count, c = 'g', linestyle = '-.', linewidth = 1.5)
plt.text('Kottayam', median_count , f'Median count :: {median_count}', fontsize = 14, va = 'center', ha = 'center', backgroundcolor = 'w', c = 'r')

plt.savefig('images/bird_vs_district.svg', bbox_inches = 'tight', pad_inches = 0.3)
plt.show()

Observation

We have about 0.26% (167 Birds) who are classified as very rare dring the course of survey.
44 birds are classified as very common by showing 1,36,240 times during the survey period.
With over 35,000 bird watch Palakkad became one of the most bird watched district in Kerala

Resident Status of Birds¶

In [17]:

number_birds = []
resident_list = kba_species_df.Resident_status.unique()
for resident in resident_list:
    number_birds.append(len(kba_species_df[kba_species_df.Resident_status == resident]))

pd.DataFrame({'Resident Status of Birds' : resident_list, 'Number of Birds' : number_birds, '% Contribution to KBA Dataset' : (number_birds / np.sum(number_birds) * 100)}).style.set_caption('Categorizing birds based on their Resident Status').set_table_styles(styles).highlight_max(subset = ['Number of Birds', '% Contribution to KBA Dataset'], color = '#DBEAF6')

Out[17]:

Categorizing birds based on their Resident Status
	Resident Status of Birds	Number of Birds	% Contribution to KBA Dataset
0	Resident	249	68.975069
1	WinterMigrant	111	30.747922
2	SummerMigrant	1	0.277008

In [18]:

kba_species_df[kba_species_df.Resident_status == 'SummerMigrant'][['Common_Name', 'Scientific_Name', 'Resident_status', 'IUCN_Redlist_Status']].style.set_caption("The only 1 'SummerMigrant' in this record").set_table_styles(styles)

Out[18]:

The only 1 'SummerMigrant' in this record
	Common_Name	Scientific_Name	Resident_status	IUCN_Redlist_Status
311	Blue-cheeked Bee-eater	Merops persicus	SummerMigrant	Least Concern

In [19]:

kba_species_data['Resident_count'] = np.ones(len(kba_species_data))

Name_Resident_Count = pd.DataFrame(kba_species_data.groupby(['Common_Name', 'Resident_status'])['Resident_count'].sum()).reset_index().sort_values('Resident_count', ascending = False).reset_index(drop=True)

temp_df = pd.DataFrame(columns = ['Common_Name', 'Resident_status', 'Resident_count'])

for resident in resident_list:
    temp_df = pd.concat([temp_df, Name_Resident_Count[Name_Resident_Count.Resident_status == resident].head()])
    
temp_df.Resident_count = temp_df.Resident_count.astype('int')

temp_df = temp_df.reset_index(drop = True)

idx = pd.IndexSlice
temp_df.style.set_caption('Top 5 Birds from each Resident category').set_table_styles(styles).set_properties(**{'background-color': '#E5F3FA'}, subset = idx[idx[:4]]).set_properties(**{'background-color': '#FEEDFF'}, subset = idx[idx[5:9]]).set_properties(**{'background-color': '#E0ECE4'}, subset = idx[idx[10]])

Out[19]:

Top 5 Birds from each Resident category
	Common_Name	Resident_status	Resident_count
0	White-cheeked Barbet	Resident	13860
1	House Crow	Resident	12381
2	Large-billed Crow	Resident	9641
3	Common Myna	Resident	9147
4	Rufous Treepie	Resident	8655
5	Blyth's Reed Warbler	WinterMigrant	2753
6	Indian Golden Oriole	WinterMigrant	2387
7	Blue-tailed Bee-eater	WinterMigrant	1172
8	Barn Swallow	WinterMigrant	1108
9	Ashy Drongo	WinterMigrant	885
10	Blue-cheeked Bee-eater	SummerMigrant	5

Observation

About 68.9% (249) of the recorded birds are Resident one and 30.7% (111) were Winter Migrants
White-cheeked Barbet(13,860) and House Crow(12,381)are the most visited birds during the entire survey with morethan 10,000 times and both are Resident birds.
Blue-cheeked Bee-eater was the only one Summer Migrant bird and was visited for 5 time.

Wet & Dry Season birds counts¶

In [20]:

print(f'Number of unique bird families present : {text_co.blue}{kba_species_df.Family.nunique()}{text_co.reset}')
print('=' * 43)

# Total Birds visit
season_count = kba_data_df.Season.value_counts()
for seas, count in zip(season_count.index, season_count.values):
    print(f'Number of total Birds observed during {text_co.blue}{seas} season{text_co.reset} is {text_co.blue}:{text_co.reset} {text_co.blue}{count}{text_co.reset}')

print('=' * 60)

# Unique Birds visit
for season in kba_data_df.Season.unique():
    bird_group_season = kba_data_df.groupby('Season').get_group(season).Common_Name
    unique_count = bird_group_season.nunique()
    print(f'Number of unique Birds visited during the {text_co.blue}{season} season{text_co.reset} is {text_co.blue}:{text_co.reset} {text_co.blue}{unique_count}{text_co.reset}')

Number of unique bird families present : 76
===========================================
Number of total Birds observed during Dry season is : 169704
Number of total Birds observed during Wet season is : 131178
============================================================
Number of unique Birds visited during the Wet season is : 404
Number of unique Birds visited during the Dry season is : 475

In [21]:

# Wet Seaason DataFrame
wet_season_bird_df = pd.DataFrame(kba_data_df.groupby('Season').get_group('Wet').Common_Name.value_counts()).reset_index().rename(columns = {'index' : 'Bird_Name', 'Common_Name' : 'Wet_Visit_Count_Count'})
wet_season_bird_df = wet_season_bird_df[wet_season_bird_df.Wet_Visit_Count_Count != 0].sort_values('Wet_Visit_Count_Count', ascending = False)

# Dry Seaason DataFrame
dry_season_bird_df = pd.DataFrame(kba_data_df.groupby('Season').get_group('Dry').Common_Name.value_counts()).reset_index().rename(columns = {'index' : 'Bird_Name', 'Common_Name' : 'Dry_Visit_Count_Count'})
dry_season_bird_df = dry_season_bird_df[dry_season_bird_df.Dry_Visit_Count_Count != 0].sort_values('Dry_Visit_Count_Count', ascending = False)

wet_season_bird_count = set(wet_season_bird_df.Bird_Name.values)
dry_season_bird_count = set(dry_season_bird_df.Bird_Name.values)

print(f'Number of Birds who visited in {text_co.red}both{text_co.reset} {text_co.blue}Dry{text_co.reset} and {text_co.blue}Wet Season{text_co.blue} is {text_co.red}:{text_co.reset} {text_co.blue}{len(wet_season_bird_count.intersection(dry_season_bird_count))}{text_co.reset}')
print(f'Number of Birds who visited {text_co.red}only{text_co.reset} in {text_co.blue}Dry Season{text_co.reset} is {text_co.red}:{text_co.blue} {text_co.blue}{len(dry_season_bird_count.difference(wet_season_bird_count))}{text_co.reset}')
print(f'Number of Birds who visited {text_co.red}only{text_co.reset} in {text_co.blue}Wet Season{text_co.reset} is {text_co.red}:{text_co.blue} {text_co.blue}{len(wet_season_bird_count.difference(dry_season_bird_count))}{text_co.reset}')

Number of Birds who visited in both Dry and Wet Season is : 387
Number of Birds who visited only in Dry Season is : 88
Number of Birds who visited only in Wet Season is : 17

In [22]:

wet_season_bird_df.head(7).style.set_caption('Top 7 Birds visited during Wet Season').set_table_styles(styles)

Out[22]:

Top 7 Birds visited during Wet Season
	Bird_Name	Wet_Visit_Count_Count
0	House Crow	6297
1	White-cheeked Barbet	5281
2	Large-billed Crow	4656
3	Common Myna	4569
4	Rufous Treepie	4480
5	Greater Coucal	4344
6	Purple-rumped Sunbird	4220

In [23]:

dry_season_bird_df.head(7).style.set_caption('Top 7 Birds visited during Dry Season').set_table_styles(styles)

Out[23]:

Top 7 Birds visited during Dry Season
	Bird_Name	Dry_Visit_Count_Count
0	White-cheeked Barbet	8579
1	House Crow	6084
2	Large-billed Crow	4985
3	Common Myna	4578
4	Red-whiskered Bulbul	4236
5	Rufous Treepie	4175
6	Greater Coucal	3921

In [24]:

# Extract Year and Month for further analysis
# https://datagy.io/pandas-extract-date-from-datetime/
kba_data_df['Year'] = kba_data_df.Date.dt.year
kba_data_df['Month'] = kba_data_df.Date.dt.month_name()


#Lets plot some figure to get intutions
plt.figure(figsize=(15,8))
plt.suptitle('Number of Birds visited per Year chart', fontsize = 30, c = '#1D32E2')
plt.title('On each season', fontsize = 15, c = '#C79438')
plt.xlabel('Year of observation')
plt.ylabel('Number of birds visited')
sns.histplot(data = kba_data_df, x = 'Year', hue = 'Season', multiple = 'stack', palette = ['red', 'green'])
plt.show()

In [25]:

grouped = kba_data_df.groupby('Year')

img_index = 1
plt.figure(figsize=(20,20))
plt.suptitle('Monthly season wise Birds count distribution', fontsize = 30, c = '#1D32E2')
for name, group in grouped:
    plt.subplot(2, 3, img_index)
    plt.title(f' Year : {name}')
    sns.histplot(data = group, x = 'Month', hue = 'Season', multiple = 'stack')
    plt.xticks(rotation = 15)
    img_index += 1

plt.show()    

In [26]:

# https://pandas.pydata.org/docs/reference/api/pandas.DatetimeIndex.weekday.html
# The day of the week with Monday=0, Sunday=6.

bird_watch_weekEnd = (kba_data_df.Date.dt.weekday >=5).value_counts()[0]
percent_weekend = int(bird_watch_weekEnd*100/len(kba_data_df))
print(f'Around {text_co.red}{percent_weekend}%{text_co.reset} of whole the survey were done during week-ends.')

Around 55% of whole the survey were done during week-ends.

In [27]:

percnt_over_year = pd.DataFrame(columns = ['Year', 'Wet_Bird_Count', 'Dry_Bird_Count'], index=range(6))

for se, co in zip(['Dry', 'Wet'], ['Dry_Bird_Count', 'Wet_Bird_Count']):
    if se == 'Dry': a = 1
    else : a = 0
    for na, gr in kba_data_df[kba_data_df.Season == se].groupby('Year'):
        percnt_over_year.loc[a].Year = na
        percnt_over_year.loc[a][co] = gr.shape[0]
        a += 1

percnt_over_year.fillna(0, inplace = True)
percnt_over_year.insert(2, 'Wet_Cum_Percentage', percnt_over_year.Wet_Bird_Count.cumsum() / percnt_over_year.Wet_Bird_Count.sum())
percnt_over_year.insert(4, 'Dry_Cum_Percentage', percnt_over_year.Dry_Bird_Count.cumsum() / percnt_over_year.Dry_Bird_Count.sum())

In [28]:

# https://stackoverflow.com/a/41924823

plt.figure(figsize=(14, 8))

plt.suptitle('Percentage Bird watch completed in each year', fontsize = 30, c = '#1D32E2')
plt.title('(Cumulative Percentage)', fontsize = 15, c = '#C79438')
plt.xlabel('Year of observation')
plt.ylabel('Percentage completed')

sns.lineplot(data = percnt_over_year, y = 'Wet_Cum_Percentage', x = 'Year', color = 'red', label = 'Wet Season')
sns.lineplot(data = percnt_over_year, y = 'Dry_Cum_Percentage', x = 'Year', color = 'blue', label = 'Dry Season')
plt.axhline(0.92, linewidth=1, linestyle = '-.', c = 'g')
plt.axhline(0.73, linewidth=1, linestyle = '-.', c = 'g')

plt.text(2016, 0.92, '92% Cumulative', fontsize = 10, va = 'center', ha = 'center', backgroundcolor = 'w', c = 'g')
plt.text(2017, 0.73, '73% Cumulative', fontsize = 10, va = 'center', ha = 'center', backgroundcolor = 'w', c = 'g')

plt.legend(loc = 5)
plt.show()

Observation

There are 404 and 475 unique birds visited during the wet and dry seasons respectively.
We have counted 88 birds those and those only came during dry season, 17 birds only during wet season.
White-cheeked Barbet, House Crow, Large-billed Crow and Common Myna are the top 4 birds by visit count on during both wet and dry seasons.
Dry season survey beagn in 2016 and it concluded in 2020.
Most of the Wet season survey happened before 2018.
In 2017 alone observers watched morethan 1,00,000 birds.
About 92% of the wet season survey were done during 2015 and 2017.
Nearly 73% of the dry season survey completed in between 2016 and 2018.
Around 55% of whole the survey were done during week-ends

Bird Watching Timings¶

We do have a time data in-hand. Let's check what is the time the data collectors used mostly.

In [29]:

def df_to_hour(df):
    # return pd.to_datetime(df.Time).dt.hour
    return df.Time_Hour

In [30]:

all_season = df_to_hour(kba_data_df)
wet_season_hr= df_to_hour(kba_data_df[kba_data_df['Season'] == 'Wet'])
dry_season_hr= df_to_hour(kba_data_df[kba_data_df['Season'] == 'Dry'])

season_time = [(all_season, 'All Season'), (wet_season_hr, 'Wet Season'), (dry_season_hr, 'Dry Season')]

plt.figure(figsize=(20, 8))
img_index = 1
for data, sea in season_time:
    plt.suptitle('Bird Watch Timings', fontsize = 30, c = '#1D32E2')
    plt.subplot(1, 3, img_index)
    plt.title(sea)
    sns.histplot(data = data, bins = 24)
    img_index += 1
plt.show()

Observation

Majority of the data is collected during morning and that too between 6-10am.
The next most data collected time is between 3-6pm.
Eventough there are few data collections are done in noon time.
This shows the best time for bird watching is on morning

Endemic and Threatened Species¶

Endemic Birds are species which are found only in one geographic region and nowhere else in the world.
The Western Ghats is one such area and is one of the world's ten "hottest biodiversity hotsports".

Species from Western Ghats¶

In [31]:

endemicity_ = pd.DataFrame(kba_species_df.Endemicity.value_counts()).reset_index().rename(columns = {'Endemicity' : 'Count', 'index' : 'label'})

fig = plt.figure()
ax = fig.add_axes([0,0,1.25,1.25])
ax.axis('equal')
plt.pie(endemicity_.Count, labels = endemicity_.label, autopct = '%1.2f%%', colors = ['#2ca12d', '#f77e12'])
plt.title("Bird's Endemicity", fontsize = 30, c = '#1D32E2')
plt.show()

In [32]:

def processDF(df, endimic, count_name):
    
    columns_ = ['Common_Name', 'Resident_status', 'IUCN_Redlist_Status', 'Family']
    df = df[df.Endemicity == endimic][columns_].reset_index(drop = True)
    iucn_df = pd.DataFrame(df.IUCN_Redlist_Status.value_counts()).reset_index().rename(columns = {'index' : 'IUCN_Status', 'IUCN_Redlist_Status' : count_name})
    
    return df, iucn_df

In [33]:

western_birds, western_iucn = processDF(kba_species_df, 'Western Ghats', 'Western_G')
_, non_western_iucn = processDF(kba_species_df, 'Not endemic', 'non_Western_G')

print(f'\nNumber of different species families in {text_co.blue}Western Ghats {text_co.reset} : {text_co.red}{western_birds.Family.nunique()}{text_co.reset}\n')

western_birds.sort_values('IUCN_Redlist_Status').reset_index(drop = True).style.set_caption('All Birds from Western Ghats').set_table_styles(styles).set_properties(**{'background-color': '#FEEDFF'}, subset = idx[idx[:1]]).set_properties(**{'background-color': '#E5F3FA'}, subset = idx[idx[2:25]]).set_properties(**{'background-color': '#FBFDEE'}, subset = idx[idx[26:27]]).set_properties(**{'background-color': '#E0ECE4'}, subset = idx[idx[28:]])

Number of different species families in Western Ghats  : 21

Out[33]:

All Birds from Western Ghats
	Common_Name	Resident_status	IUCN_Redlist_Status	Family
0	Banasura Laughingthrush	Resident	Endangered	Leiothrichidae
1	Nilgiri Laughingthrush	Resident	Endangered	Leiothrichidae
2	White-bellied Treepie	Resident	Least Concern	Corvidae
3	Black-and-orange Flycatcher	Resident	Least Concern	Muscicapidae
4	Square-tailed Bulbul	Resident	Least Concern	Pycnonotidae
5	Crimson-backed Sunbird	Resident	Least Concern	Nectariniidae
6	Wayanad Laughingthrush	Resident	Least Concern	Leiothrichidae
7	Grey-fronted Green-Pigeon	Resident	Least Concern	Columbidae
8	Orange Minivet	Resident	Least Concern	Campephagidae
9	Legge's Hawk-Eagle	Resident	Least Concern	Accipitridae
10	Indian Swiftlet	Resident	Least Concern	Apodidae
11	Malabar Barbet	Resident	Least Concern	Megalaimidae
12	Malabar Grey Hornbill	Resident	Least Concern	Bucerotidae
13	Rufous Babbler	Resident	Least Concern	Leiothrichidae
14	Malabar Parakeet	Resident	Least Concern	Psittaculidae
15	Malabar Lark	Resident	Least Concern	Alaudidae
16	Malabar Woodshrike	Resident	Least Concern	Vangidae
17	White-bellied Blue Flycatcher	Resident	Least Concern	Muscicapidae
18	Dark-fronted Babbler	Resident	Least Concern	Timaliidae
19	Hill Swallow	Resident	Least Concern	Hirundinidae
20	Malabar Starling	Resident	Least Concern	Sturnidae
21	Flame-throated Bulbul	Resident	Least Concern	Pycnonotidae
22	Nilgiri Thrush	Resident	Least Concern	Turdidae
23	Nilgiri Flycatcher	Resident	Least Concern	Muscicapidae
24	Nilgiri Flowerpecker	Resident	Least Concern	Dicaeidae
25	Yellow-browed Bulbul	Resident	Least Concern	Pycnonotidae
26	Grey-headed Bulbul	Resident	Near Threatened	Pycnonotidae
27	Palani Laughingthrush	Resident	Near Threatened	Leiothrichidae
28	Nilgiri Pipit	Resident	Vulnerable	Motacillidae
29	Broad-tailed Grassbird	Resident	Vulnerable	Locustellidae
30	White-bellied Sholakili	Resident	Vulnerable	Muscicapidae
31	Ashambu Laughingthrush	Resident	Vulnerable	Leiothrichidae
32	Nilgiri Wood-Pigeon	Resident	Vulnerable	Columbidae

In [34]:

western_birds_only = pd.merge(kba_data_df[['Common_Name', 'Season', 'County']], western_birds, on = 'Common_Name', how = 'inner')
pd.DataFrame(western_birds_only.Common_Name.value_counts()).reset_index().rename(columns = {'Common_Name' : 'Visit_Count_During_Survey', 'index' : 'Common_Name'}).head(6).style.set_caption('Endemic Species with 1000+ Visit').set_table_styles(styles)

Out[34]:

Endemic Species with 1000+ Visit
	Common_Name	Visit_Count_During_Survey
0	Yellow-browed Bulbul	2687
1	Nilgiri Flowerpecker	2288
2	Orange Minivet	2286
3	Crimson-backed Sunbird	1781
4	Malabar Parakeet	1443
5	Malabar Grey Hornbill	1346

In [35]:

iucn_status = pd.merge(western_iucn, non_western_iucn, how = 'outer').fillna(0)

iucn_status['Western_G'] = iucn_status['Western_G'].apply(np.int64)

iucn_status.insert(2, '% Western_G', iucn_status['Western_G'].apply(lambda x : round(x *100 / sum(iucn_status.Western_G), 2)))
iucn_status.insert(4, '% Non_Western_G', iucn_status['non_Western_G'].apply(lambda x : round(x *100 / sum(iucn_status.non_Western_G), 2)))

iucn_status.style.set_caption('Endemic Species IUCN Status Count').set_table_styles(styles).set_table_styles(styles).highlight_max(subset = ['Western_G', '% Western_G', 'non_Western_G', '% Non_Western_G'], color = '#DBEAF6')

Out[35]:

Endemic Species IUCN Status Count
	IUCN_Status	Western_G	% Western_G	non_Western_G	% Non_Western_G
0	Least Concern	24	72.730000	303	92.380000
1	Vulnerable	5	15.150000	6	1.830000
2	Near Threatened	2	6.060000	16	4.880000
3	Endangered	2	6.060000	1	0.300000
4	Critically Endangered	0	0.000000	2	0.610000

In [36]:

district_wise_count = pd.DataFrame(western_birds_only.County.value_counts()).reset_index().rename(columns = {'index' : 'District_Name', 'County' : 'Bird_Count'}).sort_values('Bird_Count', ascending = True).reset_index(drop=True)
median_count = int(district_wise_count.Bird_Count.median())

plt.figure(figsize=(20,8))
sns.lineplot(x = district_wise_count.District_Name, y= district_wise_count.Bird_Count)
plt.suptitle('Western Ghats Bird Count vs Districts', fontsize = 30, c = '#1D32E2')
plt.title('Total number of birds counted from each district of Kerala', fontsize = 15, c = '#C79438')
plt.xlabel('District Names of Kerala')
plt.ylabel('Number of birds counted')
plt.xticks(rotation = 20)
plt.axhline(median_count, c = 'g', linestyle = '-.', linewidth = 1.5)
plt.text('Kottayam', median_count , f'Median count :: {median_count}', fontsize = 14, va = 'center', ha = 'center', backgroundcolor = 'w', c = 'r')

plt.show()

Observation

The data shows around 9% of the birds regularly found in Western Ghats region.
Out of 76 bird families 21 were regularly found in Western Ghats.
In total 33 species from 21 families were endimic to Western Ghats.
6 endimic species have over 1000+ occurances in the dataset
The Vulnerable and Endangered species birds are more attracted towards the Western Ghats.

Kerala Bird Atlas (KBA) -EDA¶

Objective¶

Methodology¶

Spatial extent¶

Sampling protocol¶

Dataset¶

More about Dataset¶

Reading the Datasets¶

Missing Values¶

Generating New Features¶

Number of Distinct Birds Observed¶

Observation

Categorizing Birds Based on the Occurance of Records (Both Dry and Wet Season Combined)¶

Observation

Resident Status of Birds¶

Observation

Wet & Dry Season birds counts¶

Observation

Bird Watching Timings¶

Observation

Endemic and Threatened Species¶

Species from Western Ghats¶

Observation

Kerala Bird Atlas (KBA) -EDA
¶