Columns Needed to import from MongoDB:
file_list_columns = ['EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod']
SOX_columns = ['WhistleblowerPolicy', 'WhistleblowerPolicyInd', 'ConflictOfInterestPolicyInd', 'ConflictOfInterestPolicy', 'DocumentRetentionPolicy', 'DocumentRetentionPolicyInd',]
revenue_columns = ['CYTotalRevenueAmt', 'TotalRevenueCurrentYear']
efficiency_columns = ['CYTotalExpensesAmt', 'TotalExpensesCurrentYear' 'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt']
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt', 'MembershipDues', 'MembershipDuesAmt', 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp', 'FundraisingAmt', 'FundraisingGrossIncomeAmt', 'RelatedOrganizations', 'RelatedOrganizationsAmt', 'GovernmentGrants', 'GovernmentGrantsAmt', 'PYProgramServiceRevenueAmt','CYProgramServiceRevenueAmt', 'ProgramServiceRevenueCY', 'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt', 'CYInvestmentIncomeAmt', 'InvestmentIncomeCurrentYear', 'OtherRevenueMiscGrp', 'OtherRevenueTotalAmt', 'CYOtherRevenueAmt', 'TotalOtherRevenue', ]
cols = file_list_columns+SOX_columns+revenue_columns+efficiency_columns+complexity_columns
Columns Needed to keep to merge with existing data:
file_list_columns = ['EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod', 'FYE']
SOX_columns = ['whistleblower_policy', 'conflict_of_interest_policy', 'records_retention_policy', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']
control_columns = ['tot_rev', 'total_expenses', 'program_expenses', 'complexity']
unneeded_columns = ['CYTotalExpensesAmt', u'TotalExpensesCurrentYear', u'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp', u'TotalProgramServiceExpense', u'TotalProgramServiceExpensesAmt', 'CYContributionsGrantsAmt', u'FederatedCampaigns', u'FederatedCampaignsAmt', u'MembershipDues', 'MembershipDuesAmt', u'NetIncomeFromFundraisingEvents', u'NetIncmFromFundraisingEvtGrp', 'FundraisingAmt', 'FundraisingGrossIncomeAmt', u'RelatedOrganizations', 'RelatedOrganizationsAmt', u'GovernmentGrants', 'GovernmentGrantsAmt', 'PYProgramServiceRevenueAmt', u'CYProgramServiceRevenueAmt', 'ProgramServiceRevenue', 'ProgramServiceRevenueCY', u'ProgramServiceRevenueGrp', u'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt', u'CYInvestmentIncomeAmt', u'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp', u'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp', u'TotalOtherRevenue', 'CYOtherRevenueAmt', 'fundraising', 'program_revenue', 'OtherRevMisc', 'other_revenue_binary']
cols = file_list_columns+SOX_columns+control_columns
import sys
import time
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sqlite3 as lite
from pandas import DataFrame
from pandas import Series
#http://pandas.pydata.org/pandas-docs/stable/options.html
pd.set_option('display.max_columns', None)
pd.set_option('max_colwidth', 250)
plt.rcParams['figure.figsize'] = (15, 5)
%matplotlib inline
cd '/Users/gregorysaxton/Google Drive/SOX'
/Users/gregorysaxton/Google Drive/SOX
dfx = pd.read_excel('advisories.xls')
dfx[:1]
org_id | EIN | advisory text - current advisory | classification -- Greg | classification -- Dan | advisory_text_0916 | level_0916 | current_ratings_url | date_0916 | name_0916 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 16722 | 20503776 | On August 1, 2016, the New Hampshire Union Leader published an article titled, "Former Portsmouth youth softball president accused of stealing thousands from nonprofit." | potential material diversion | NaN | On August 1, 2016, the New Hampshire Union Leader reported that Portsmouth Girls Softball Association was, or may continue to be, a subject of a material diversion of assets in an article titled, "Former Portsmouth youth softball president accus... | medium | http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=16722 | 08/12/16 | Portsmouth Girls Softball Association |
dfx['classification -- Greg'].value_counts()
zero FR 15 potential material diversion 12 fundraising by non-charity 7 potential corruption/impropriety 4 potential financial violations 3 potential fundraising impropriety 2 tax-exempt status revoked 2 potential fraud 2 potential corruption 1 misuse of federal grant money 1 potential fraud/embezzlement 1 fundraising by non-charity; charity shut down 1 potential improper conduct 1 under investigation 1 tax-exempt status revoked; fraud 1 non-filer 1 potential bribery 1 potential corruption/impropriety -- and charity potentially shut down 1 potential corruption/impropriety/fraud -- and charity potentially shut down 1 molestation lawsuit 1 I DON'T THINK THIS ONE CAN BE THE CORRECT ADVISORY TEXT!!! 1 whistleblower complaint of fraud 1 Name: classification -- Greg, dtype: int64
len(dfx)
321
Depending on the project, I will store the data in SQLite or MongoDB. This time I'll use MongoDB -- it's great for storing JSON data where each observation could have different variables. Before we get to the interesting part the following code blocks set up the MongoDB environment and the new database we'll be using.
Note: In a terminal we'll have to start MongoDB by running the command mongod or sudo mongod. Then we run the following code block here to access MongoDB.
import pymongo
from pymongo import MongoClient
client = MongoClient()
f = open('2016 donor advisory EINs.json', 'r')
advisories_2016 = json.load(f)
print len(advisories_2016)
print advisories_2016[:10]
321 [nan, nan, u'311660730', u'432059591', u'320022295', u'870804117', nan, u'223871535', u'620476294', u'311016441']
print len([str(x) for x in advisories_2016 if str(x) != 'nan'])
advisories_2016 = [str(x) for x in advisories_2016 if str(x) != 'nan']
print len(advisories_2016)
print advisories_2016[:5]
256 256 ['311660730', '432059591', '320022295', '870804117', '223871535']
There are 1,077 filings for those 256 EINs
file_list.find({'EIN': { '$in': advisories_2016}}).count()
1077
# DEFINE MY COLLECTION (DATABASE TABLE) WHERE I'LL INSERT THE MESSAGES
filings = db['filings']
filings.count()
0
db.filings.create_index([('URL', pymongo.ASCENDING)], unique=True)
u'URL_1'
list(db.filings.index_information())
[u'_id_', u'URL_1']
df = pd.DataFrame(list(filings.find()))
print '# of columns:', len(df.columns)
print '# of observations:', len(df)
df.head(1)
# of columns: 671 # of observations: 538
@documentId | @referenceDocumentId | @referenceDocumentName | @softwareId | @softwareVersion | @softwareVersionNum | AccountantCompileOrReview | AccountantCompileOrReviewBasis | AccountantCompileOrReviewInd | AccountsPayableAccrExpnssGrp | AccountsPayableAccruedExpenses | AccountsReceivable | AccountsReceivableGrp | AcctCompileOrReviewBasisGrp | ActivitiesConductedPartnership | ActivitiesConductedPrtshpInd | Activity2 | Activity3 | ActivityOrMissionDesc | ActivityOrMissionDescription | ActivityOther | AddressChange | AddressChangeInd | AddressPrincipalOfficerUS | Advertising | AdvertisingGrp | AllAffiliatesIncluded | AllAffiliatesIncludedInd | AllOtherContributions | AllOtherContributionsAmt | AllOtherExpenses | AllOtherExpensesGrp | AmendedReturn | AmendedReturnInd | AnnualDisclosureCoveredPersons | AnnualDisclosureCoveredPrsnInd | AuditCommittee | AuditCommitteeInd | AuditedFinancialStmtAttInd | BackupWthldComplianceInd | BalanceSheetAmountsReported | BenefitsPaidToMembersCY | BenefitsPaidToMembersPriorYear | BenefitsToMembers | BenefitsToMembersGrp | BooksInCareOfDetail | BsnssRltnshpThruFamilyMember | BsnssRltnshpWithOrganization | BusinessRlnWithFamMemInd | BusinessRlnWithOfficerEntInd | BusinessRlnWithOrgMemInd | CYBenefitsPaidToMembersAmt | CYContributionsGrantsAmt | CYGrantsAndSimilarPaidAmt | CYInvestmentIncomeAmt | CYOtherExpensesAmt | CYOtherRevenueAmt | CYProgramServiceRevenueAmt | CYRevenuesLessExpensesAmt | CYSalariesCompEmpBnftPaidAmt | CYTotalExpensesAmt | CYTotalFundraisingExpenseAmt | CYTotalProfFndrsngExpnsAmt | CYTotalRevenueAmt | CapStckTrstPrinCurrentFunds | CapStkTrPrinCurrentFundsGrp | CashNonInterestBearing | CashNonInterestBearingGrp | ChangeToOrgDocumentsInd | ChangesToOrganizingDocs | CntrbtnsRprtdFundraisingEvents | CntrctRcvdGreaterThan100KCnt | CollectionsOfArt | CollectionsOfArtInd | CompCurrentOfcrDirectorsGrp | CompCurrentOfficersDirectors | CompDisqualPersons | CompDisqualPersonsGrp | CompensationFromOtherSources | CompensationFromOtherSrcsInd | CompensationProcessCEO | CompensationProcessCEOInd | CompensationProcessOther | CompensationProcessOtherInd | ComplianceWithBackupWitholding | ConferencesMeetings | ConferencesMeetingsGrp | ConflictOfInterestPolicy | ConflictOfInterestPolicyInd | ConservationEasements | ConservationEasementsInd | ConsolidatedAuditFinancialStmt | ConsolidatedAuditFinclStmtInd | ContractorCompensation | ContractorCompensationGrp | ContriRptFundraisingEventAmt | ContributionsGrantsCurrentYear | ContributionsGrantsPriorYear | CostOfGoodsSold | CostOfGoodsSoldAmt | CreditCounseling | CreditCounselingInd | DAFExcessBusinessHoldingsInd | DLN | DecisionsSubjectToApprovaInd | DecisionsSubjectToApproval | DeductibleArtContributionInd | DeductibleContributionsOfArt | DeductibleNonCashContriInd | DeductibleNonCashContributions | DeferredRevenue | DeferredRevenueGrp | DelegationOfManagementDuties | DelegationOfMgmtDutiesInd | DepreciationDepletion | DepreciationDepletionGrp | Desc | DescribedIn501C3 | DescribedInSection501c3Ind | Description | DisregardedEntity | DisregardedEntityInd | DistributionToDonor | DistributionToDonorInd | DoNotFollowSFAS117 | DocumentRetentionPolicy | DocumentRetentionPolicyInd | DoingBusinessAs | DoingBusinessAsName | DonatedServicesAndUseFcltsAmt | DonorAdvisedFundInd | DonorAdvisedFunds | EIN | ElectionOfBoardMembers | ElectionOfBoardMembersInd | EmployeeCnt | EmploymentTaxReturnsFiled | EmploymentTaxReturnsFiledInd | EngagedInExcessBenefitTransInd | EscrowAccount | EscrowAccountInd | EscrowAccountLiability | EscrowAccountLiabilityGrp | ExcessBenefitTransaction | ExcessBusinessHoldings | ExcessBusinessHoldingsInd | Expense | ExpenseAmt | FSAudited | FSAuditedBasis | FSAuditedBasisGrp | FSAuditedInd | FamilyOrBusinessRelationship | FamilyOrBusinessRlnInd | FederalGrantAuditPerformed | FederalGrantAuditPerformedInd | FederalGrantAuditRequired | FederalGrantAuditRequiredInd | FederatedCampaigns | FederatedCampaignsAmt | FeesForServicesAccounting | FeesForServicesAccountingGrp | FeesForServicesInvstMgmntFees | FeesForServicesLegal | FeesForServicesLegalGrp | FeesForServicesLobbying | FeesForServicesLobbyingGrp | FeesForServicesManagement | FeesForServicesManagementGrp | FeesForServicesOther | FeesForServicesOtherGrp | FeesForServicesProfFundraising | FeesForSrvcInvstMgmntFeesGrp | FiledLieu1041 | FinancialStatementBoth | FinancialStatementConsolidated | FinancialStatementSeparate | FinancialStmtAttached | FollowSFAS117 | ForeignActivities | ForeignActivitiesInd | ForeignAddress | ForeignCountryCd | ForeignFinancialAccount | ForeignFinancialAccountInd | ForeignGrants | ForeignGrantsGrp | ForeignOffice | ForeignOfficeInd | Form1098CFiled | Form1098CFiledInd | Form720Filed | Form8282FiledCnt | Form8282PropertyDisposedOf | Form8282PropertyDisposedOfInd | Form8886TFiled | Form8886TFiledInd | Form8899Filed | Form8899Filedind | Form990-TFiled | Form990PartVIISectionA | Form990PartVIISectionAGrp | Form990ProvidedToGoverningBody | Form990ProvidedToGvrnBodyInd | Form990TFiledInd | FormType | FormationYr | FormerOfcrEmployeesListedInd | FormersListed | FundraisingActivities | FundraisingActivitiesInd | FundraisingAmt | FundraisingDirectExpenses | FundraisingDirectExpensesAmt | FundraisingEvents | FundraisingGrossIncomeAmt | FundsToPayPremiums | GainOrLoss | GainOrLossGrp | Gaming | GamingActivitiesInd | GoverningBodyVotingMembersCnt | GovernmentGrants | GovernmentGrantsAmt | GrantAmt | GrantToRelatedPerson | GrantToRelatedPersonInd | Grants | GrantsAndSimilarAmntsCY | GrantsAndSimilarAmntsPriorYear | GrantsPayable | GrantsPayableGrp | GrantsToDomesticIndividuals | GrantsToDomesticIndividualsGrp | GrantsToDomesticOrgs | GrantsToDomesticOrgsGrp | GrantsToIndividuals | GrantsToIndividualsInd | GrantsToOrganizations | GrantsToOrganizationsInd | GrossAmountSalesAssets | GrossAmountSalesAssetsGrp | GrossIncomeFundraisingEvents | GrossReceipts | GrossReceiptsAmt | GrossRents | GrossRentsGrp | GrossSalesOfInventory | GrossSalesOfInventoryAmt | GroupExemptionNum | GroupExemptionNumber | GroupReturnForAffiliates | GroupReturnForAffiliatesInd | Hospital | IRPDocumentCnt | IRPDocumentW2GCnt | IncludeFIN48FootnoteInd | IncmFromInvestBondProceedsGrp | IncomeFromInvestBondProceeds | IndependentAuditFinancialStmt | IndependentAuditFinclStmtInd | IndependentVotingMemberCnt | IndivRcvdGreaterThan100KCnt | IndoorTanningServices | IndoorTanningServicesInd | InfoInScheduleOPartIII | InfoInScheduleOPartIIIInd | InfoInScheduleOPartIX | InfoInScheduleOPartIXInd | InfoInScheduleOPartV | InfoInScheduleOPartVI | InfoInScheduleOPartVII | InfoInScheduleOPartVIII | InfoInScheduleOPartVIIIInd | InfoInScheduleOPartVIIInd | InfoInScheduleOPartVIInd | InfoInScheduleOPartVInd | InfoInScheduleOPartX | InfoInScheduleOPartXI | InfoInScheduleOPartXII | InfoInScheduleOPartXIIInd | InfoInScheduleOPartXIInd | InfoInScheduleOPartXInd | InformationTechnology | InformationTechnologyGrp | InitialReturn | Insurance | InsuranceGrp | IntangibleAssets | IntangibleAssetsGrp | Interest | InterestGrp | InventoriesForSaleOrUse | InventoriesForSaleOrUseGrp | InvestTaxExemptBonds | InvestTaxExemptBondsInd | InvestmentExpenseAmt | InvestmentInJointVenture | InvestmentInJointVentureInd | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | InvestmentIncomePriorYear | InvestmentsOtherSecurities | InvestmentsOtherSecuritiesGrp | InvestmentsProgramRelated | InvestmentsProgramRelatedGrp | InvestmentsPubTradedSecGrp | InvestmentsPubTradedSecurities | IsAvailable | IsElectronic | JointCosts | JointCostsInd | LandBldgEquipAccumDeprecAmt | LandBldgEquipBasisNetGrp | LandBldgEquipCostOrOtherBssAmt | LandBldgEquipmentAccumDeprec | LandBuildingsEquipmentBasis | LandBuildingsEquipmentBasisNet | LastUpdated | LegalDomicileStateCd | LessCostOthBasisSalesExpenses | LessCostOthBasisSalesExpnssGrp | LessRentalExpenses | LessRentalExpensesGrp | LicensedMoreThanOneState | LicensedMoreThanOneStateInd | LoanOutstandingInd | LoanToOfficerOrDQP | LoansFromOfficersDirectors | LoansFromOfficersDirectorsGrp | LobbyingActivities | LobbyingActivitiesInd | LocalChapters | LocalChaptersInd | MaterialDiversionOrMisuse | MaterialDiversionOrMisuseInd | MembersOrStockholders | MembersOrStockholdersInd | MembershipDues | MembershipDuesAmt | MethodOfAccountingAccrual | MethodOfAccountingAccrualInd | MethodOfAccountingCash | MethodOfAccountingCashInd | MethodOfAccountingOther | MethodOfAccountingOtherInd | MinutesOfCommittees | MinutesOfCommitteesInd | MinutesOfGoverningBody | MinutesOfGoverningBodyInd | MiscellaneousRevenue | MiscellaneousRevenueGrp | MissionDesc | MissionDescription | MoreThan5000KToIndividuals | MoreThan5000KToIndividualsInd | MoreThan5000KToOrgInd | MoreThan5000KToOrganizations | MortNotesPyblSecuredInvestProp | MortgNotesPyblScrdInvstPropGrp | NameOfForeignCountry | NameOfPrincipalOfficerBusiness | NameOfPrincipalOfficerPerson | NbrIndependentVotingMembers | NbrVotingGoverningBodyMembers | NbrVotingMembersGoverningBody | NetAssetsOrFundBalancesBOY | NetAssetsOrFundBalancesBOYAmt | NetAssetsOrFundBalancesEOY | NetAssetsOrFundBalancesEOYAmt | NetGainOrLossInvestments | NetGainOrLossInvestmentsGrp | NetIncmFromFundraisingEvtGrp | NetIncomeFromFundraisingEvents | NetIncomeFromGaming | NetIncomeFromGamingGrp | NetIncomeOrLoss | NetIncomeOrLossGrp | NetRentalIncomeOrLoss | NetRentalIncomeOrLossGrp | NetUnrelatedBusTxblIncmAmt | NetUnrelatedBusinessTxblIncome | NetUnrlzdGainsLossesInvstAmt | NoListedPersonsCompensated | NoListedPersonsCompensatedInd | NonDeductibleContributions | NonDeductibleDisclosure | NoncashContributions | NoncashContributionsAmt | NondeductibleContriDisclInd | NondeductibleContributionsInd | NumberFormsTransmittedWith1096 | NumberIndependentVotingMembers | NumberIndividualsGT100K | NumberOf8282Filed | NumberOfContractorsGT100K | NumberOfEmployees | NumberW2GIncluded | ObjectId | Occupancy | OccupancyGrp | OfficeExpenses | OfficeExpensesGrp | OfficerEntityWithBsnssRltnshp | OfficerMailingAddress | OfficerMailingAddressInd | OnBehalfOfIssuer | OnBehalfOfIssuerInd | OperateHospitalInd | OrgDoesNotFollowSFAS117Ind | OrgFiledInLieuOfForm1041Ind | Organization501c | Organization501c3 | Organization501c3Ind | Organization501cInd | OrganizationFollowsSFAS117Ind | OrganizationName | OthNotesLoansReceivableNetGrp | OtherAssetsTotal | OtherAssetsTotalGrp | OtherChangesInNetAssetsAmt | OtherEmployeeBenefits | OtherEmployeeBenefitsGrp | OtherExpensePriorYear | OtherExpenses | OtherExpensesCurrentYear | OtherExpensesGrp | OtherExplainInSchO | OtherInd | OtherLiabilities | OtherLiabilitiesGrp | OtherNotesLoansReceivableNet | OtherRevenueCurrentYear | OtherRevenueMisc | OtherRevenueMiscGrp | OtherRevenuePriorYear | OtherRevenueTotalAmt | OtherSalariesAndWages | OtherSalariesAndWagesGrp | OtherWebsite | OtherWebsiteInd | OwnWebsite | OwnWebsiteInd | PYBenefitsPaidToMembersAmt | PYContributionsGrantsAmt | PYExcessBenefitTransInd | PYGrantsAndSimilarPaidAmt | PYInvestmentIncomeAmt | PYOtherExpensesAmt | PYOtherRevenueAmt | PYProgramServiceRevenueAmt | PYRevenuesLessExpensesAmt | PYSalariesCompEmpBnftPaidAmt | PYTotalExpensesAmt | PYTotalProfFndrsngExpnsAmt | PYTotalRevenueAmt | PaidInCapSrplsLandBldgEqpFund | PartialLiquidation | PartialLiquidationInd | PayPremiumsPrsnlBnftCntrctInd | PaymentsToAffiliates | PaymentsToAffiliatesGrp | PayrollTaxes | PayrollTaxesGrp | PdInCapSrplsLandBldgEqpFundGrp | PensionPlanContributions | PensionPlanContributionsGrp | PermanentlyRestrictedNetAssets | PermanentlyRstrNetAssetsGrp | PledgesAndGrantsReceivable | PledgesAndGrantsReceivableGrp | PoliciesReferenceChapters | PoliciesReferenceChaptersInd | PoliticalActivities | PoliticalCampaignActyInd | PremiumsPaid | PrepaidExpensesDeferredCharges | PrepaidExpensesDefrdChargesGrp | PrincipalOfficerNm | PriorExcessBenefitTransaction | PriorPeriodAdjustmentsAmt | ProfessionalFundraising | ProfessionalFundraisingInd | ProgSrvcAccomActy2Grp | ProgSrvcAccomActy3Grp | ProgSrvcAccomActyOtherGrp | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | ProgramServiceRevenuePriorYear | ProhibitedTaxShelterTrans | ProhibitedTaxShelterTransInd | PymtTravelEntrtnmntPubOfclGrp | QuidProQuoContriDisclInd | QuidProQuoContributions | QuidProQuoContributionsInd | QuidProQuoDisclosure | RcvFndsToPayPrsnlBnftCntrctInd | RcvblFromDisqualifiedPrsnGrp | ReceivablesFromDisqualPersons | ReceivablesFromOfficersEtc | ReceivablesFromOfficersEtcGrp | ReconcilationDonatedServices | ReconcilationInvestExpenses | ReconcilationOtherChanges | ReconcilationPriorAdjustment | ReconcilationRevenueExpenses | ReconcilationRevenueExpnssAmt | ReconciliationUnrealizedInvest | RegularMonitoringEnforcement | RegularMonitoringEnfrcInd | RelatedEntity | RelatedEntityInd | RelatedOrgControlledEntity | RelatedOrganizationCtrlEntInd | RelatedOrganizations | RelatedOrganizationsAmt | RentalIncomeOrLoss | RentalIncomeOrLossGrp | ReportFin48Footnote | ReportInvestOthSecurities | ReportInvestmentsOtherSecInd | ReportLandBldgEquip | ReportLandBuildingEquipmentInd | ReportOtherAssets | ReportOtherAssetsInd | ReportOtherLiabilities | ReportOtherLiabilitiesInd | ReportProgRelInvest | ReportProgramRelatedInvstInd | RetainedEarningsEndowmentEtc | Revenue | RevenueAmt | RevenuesLessExpensesCY | RevenuesLessExpensesPriorYear | Royalties | RoyaltiesGrp | RoyaltiesRevenue | RoyaltiesRevenueGrp | RtnEarnEndowmentIncmOthFndsGrp | SalariesEtcCurrentYear | SalariesEtcPriorYear | SavingsAndTempCashInvestments | SavingsAndTempCashInvstGrp | ScheduleBRequired | ScheduleBRequiredInd | ScheduleJRequired | ScheduleJRequiredInd | ScheduleORequired | ScheduleORequiredInd | School | SchoolOperatingInd | SignificantChange | SignificantChangeInd | SignificantNewProgramServices | SignificantNewProgramSrvcInd | StateLegalDomicile | StatesWhereCopyOfReturnIsFiled | StatesWhereCopyOfReturnIsFldCd | SubjectToProxyTax | SubjectToProxyTaxInd | SubmittedOn | TaxExemptBondLiabilities | TaxExemptBondLiabilitiesGrp | TaxExemptBonds | TaxExemptBondsInd | TaxPeriod | TaxableDistributions | TaxableDistributionsInd | TaxablePartyNotification | TaxablePartyNotificationInd | TempOrPermanentEndowmentsInd | TemporarilyRestrictedNetAssets | TemporarilyRstrNetAssetsGrp | TermOrPermanentEndowments | TerminateOperationsInd | Terminated | TerminationOrContraction | TheBooksAreInCareOf | TotLiabNetAssetsFundBalanceGrp | TotReportableCompRltdOrgAmt | TotalAssets | TotalAssetsBOY | TotalAssetsBOYAmt | TotalAssetsEOY | TotalAssetsEOYAmt | TotalAssetsGrp | TotalCompGT150K | TotalCompGreaterThan150KInd | TotalContributions | TotalContributionsAmt | TotalEmployeeCnt | TotalExpensesCurrentYear | TotalExpensesPriorYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalFundrsngExpCurrentYear | TotalGrossUBI | TotalGrossUBIAmt | TotalJointCosts | TotalJointCostsGrp | TotalLiabNetAssetsFundBalances | TotalLiabilities | TotalLiabilitiesBOY | TotalLiabilitiesBOYAmt | TotalLiabilitiesEOY | TotalLiabilitiesEOYAmt | TotalLiabilitiesGrp | TotalNbrEmployees | TotalNbrVolunteers | TotalNetAssetsFundBalanceGrp | TotalNetAssetsFundBalances | TotalOfOtherProgramServiceExp | TotalOfOtherProgramServiceGrnt | TotalOfOtherProgramServiceRev | TotalOthProgramServiceRevGrp | TotalOthProgramServiceRevenue | TotalOtherCompensation | TotalOtherCompensationAmt | TotalOtherProgSrvcExpenseAmt | TotalOtherProgSrvcGrantAmt | TotalOtherProgSrvcRevenueAmt | TotalOtherRevenue | TotalProfFundrsngExpCY | TotalProfFundrsngExpPriorYear | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | TotalReportableCompFrmRltdOrgs | TotalReportableCompFromOrg | TotalReportableCompFromOrgAmt | TotalRevenue | TotalRevenueCurrentYear | TotalRevenueGrp | TotalRevenuePriorYear | TotalVolunteersCnt | TransactionRelatedEntity | TransactionWithControlEntInd | TransfersToExemptNonChrtblOrg | Travel | TravelEntrtnmntPublicOfficials | TravelGrp | TrnsfrExmptNonChrtblRltdOrgInd | TypeOfOrganizationCorpInd | TypeOfOrganizationCorporation | TypeOfOrganizationOther | TypeOfOrganizationOtherInd | TypeOfOrganizationTrust | TypeOfOrganizationTrustInd | URL | USAddress | UnrelatedBusIncmOverLimitInd | UnrelatedBusinessIncome | UnrestrictedNetAssets | UnrestrictedNetAssetsGrp | UnsecuredNotesLoansPayable | UnsecuredNotesLoansPayableGrp | UponRequest | UponRequestInd | VotingMembersGoverningBodyCnt | VotingMembersIndependentCnt | WebSite | WebsiteAddressTxt | WhistleblowerPolicy | WhistleblowerPolicyInd | WrittenPolicyOrProcedure | WrittenPolicyOrProcedureInd | YearFormation | _id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | RetDoc1 | RetDoc2 | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | RAISE MONIES TO CONSTRUCT A FACILITY FOR YEAR-ROUND RECREATION PROGRAM DEDICATED TO CHILDREN AND ADULTS WITH SPECIAL NEEDS WHO RESIDE IN THE TOWN OF HEMPSTEAD. | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'PhoneNum': u'5164316946', u'USAddress': {u'CityNm': u'LIDO BEACH', u'StateAbbreviationCd': u'NY', u'ZIPCd': u'11561', u'AddressLine1Txt': u'630 LIDO BLVD'}, u'BusinessName': {u'BusinessNameLine1Txt': u'ANCHOR BUILDING FUND INC'}} | NaN | NaN | false | false | false | 0 | 0 | 0 | 0 | 34400 | 325000 | 0 | 290600 | 0 | 34400 | 0 | 0 | 325000 | NaN | NaN | NaN | {u'EOYAmt': u'290600'} | false | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | 93493316047805 | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | false | NaN | NaN | THE ORGANIZATIONS EXEMPT PURPOSE WAS TO RAISE MONIES TO CONSTRUCT A FACILTIY TO OPERATE ITS YEAR-ROUND PROGRAM. | NaN | {u'#text': u'true', u'@referenceDocumentId': u'RetDoc3'} | NaN | NaN | false | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | 271317072 | NaN | false | 0 | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalAmt': u'9400', u'ManagementAndGeneralAmt': u'9400'} | NaN | NaN | {u'TotalAmt': u'25000', u'ManagementAndGeneralAmt': u'25000'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | NaN | [{u'PersonNm': u'JOHN MALLOY', u'TitleTxt': u'PRESIDENT', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'ANNA DRAGO... | NaN | false | NaN | 990 | 2009 | false | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | 2 | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | 325000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | 2 | 0 | false | NaN | NaN | NaN | false | 2 | NaN | NaN | false | NaN | X | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | X | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | True | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2016-03-21T17:23:53 | NY | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | false | NaN | false | NaN | true | NaN | false | NaN | NaN | NaN | NaN | NaN | X | NaN | NaN | NaN | false | NaN | false | NaN | NaN | RAISE MONIES TO CONSTRUCT A FACILITY FOR YEAR-ROUND RECREATION PROGRAM DEDICATED TO CHILDREN AND ADULTS WITH SPECIAL NEEDS WHO RESIDE IN THE TOWN OF HEMPSTEAD. | NaN | NaN | false | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 290600 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | X | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 201503169349304780 | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | false | NaN | NaN | NaN | NaN | X | NaN | X | ANCHOR BUILDING FUND INC | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000', u'Desc': u'SETTLEMENT'} | NaN | 325000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | JOHN MALLOY | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 290600 | NaN | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | true | NaN | false | NaN | true | NaN | false | NaN | NaN | NY | NaN | false | 2016-01-30 | NaN | NaN | NaN | false | 201412 | NaN | NaN | NaN | false | false | NaN | NaN | NaN | false | NaN | NaN | NaN | {u'BOYAmt': u'0', u'EOYAmt': u'290600'} | NaN | NaN | NaN | NaN | NaN | 290600 | {u'BOYAmt': u'0', u'EOYAmt': u'290600'} | NaN | false | NaN | NaN | 0 | NaN | NaN | NaN | {u'TotalAmt': u'34400', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'34400', u'FundraisingAmt': u'0'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | {u'BOYAmt': u'0', u'EOYAmt': u'0'} | NaN | NaN | {u'BOYAmt': u'0', u'EOYAmt': u'290600'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | X | NaN | NaN | NaN | NaN | NaN | https://s3.amazonaws.com/irs-form-990/201503169349304780_public.xml | {u'CityNm': u'LIDO BEACH', u'StateAbbreviationCd': u'NY', u'ZIPCd': u'11561', u'AddressLine1Txt': u'630 LIDO BLVD'} | false | NaN | NaN | {u'EOYAmt': u'290600'} | NaN | NaN | NaN | X | 2 | 2 | NaN | WWW.CAMPANCHOR.ORG | NaN | false | NaN | NaN | NaN | 57cce7823ffc5a55fe7f54b5 |
print df['TaxPeriod'].value_counts().sum()
df['FYE'] = 'FY' + df['TaxPeriod'].str[:4]
print df['FYE'].value_counts().sum()
print df['FYE'].value_counts()
538 538 FY2013 120 FY2014 117 FY2012 112 FY2011 97 FY2010 77 FY2015 15 Name: FYE, dtype: int64
print df['EIN'].value_counts().sum()
538
columns = ['EIN', 'FYE', 'AuditCommittee', 'AuditCommitteeInd', 'CYContributionsGrantsAmt',
'CYTotalFundraisingExpenseAmt', 'GoverningBodyVotingMembersCnt', 'IndependentAuditFinancialStmt',
'IndependentAuditFinclStmtInd', 'IndependentVotingMemberCnt', 'LoanToOfficerOrDQP',
'NbrIndependentVotingMembers', 'TotalContributions', 'TotalContributionsAmt',
'TotalFundrsngExpCurrentYear', 'TotalFunctionalExpensesGrp', 'FundraisingAmt',
'ConsolidatedAuditFinancialStmt', 'ConsolidatedAuditFinclStmtInd',
'AuditedFinancialStmtAttInd', 'FSAudited', 'FSAuditedBasis', 'FSAuditedBasisGrp', 'FSAuditedInd',
'IndependentVotingMemberCnt', 'NumberIndependentVotingMembers', 'VotingMembersIndependentCnt'
]
print df['AuditCommittee'].value_counts().sum()
print df['AuditCommitteeInd'].value_counts().sum()
print df['CYContributionsGrantsAmt'].value_counts().sum()
print df['CYTotalFundraisingExpenseAmt'].value_counts().sum()
print df['GoverningBodyVotingMembersCnt'].value_counts().sum()
print len(df[df['IndependentAuditFinancialStmt'].notnull()])
print len(df[df['IndependentAuditFinclStmtInd'].notnull()]), '\n'
print df['IndependentVotingMemberCnt'].value_counts().sum()
#print df['LoanToOfficerOrDQP'].value_counts().sum()
print len(df[df['LoanToOfficerOrDQP'].notnull()])
print df['NbrIndependentVotingMembers'].value_counts().sum()
print df['TotalContributions'].value_counts().sum()
print df['TotalContributionsAmt'].value_counts().sum()
print df['TotalFundrsngExpCurrentYear'].value_counts().sum()
print len(df[df['TotalFunctionalExpensesGrp'].notnull()])
print df['FundraisingAmt'].value_counts().sum()
#'ConsolidatedAuditFinancialStmt', u'ConsolidatedAuditFinclStmtInd'
# u'AuditedFinancialStmtAttInd',
#FSAudited', u'FSAuditedBasis', u'FSAuditedBasisGrp', u'FSAuditedInd',
#'IndependentVotingMemberCnt'
#'NumberIndependentVotingMembers'
#'VotingMembersIndependentCnt'
271 138 194 194 194 344 194 194 344 344 323 176 344 194 32
columns = ['EIN', 'FYE', 'AuditCommittee', 'AuditCommitteeInd',
'GoverningBodyVotingMembersCnt', 'VotingMembersGoverningBodyCnt', #THESE TWO SEEM TO BE THE SAME
'IndependentVotingMemberCnt', 'VotingMembersIndependentCnt', #THESE TWO SEEM TO BE THE SAME
'NbrVotingGoverningBodyMembers', 'NbrVotingMembersGoverningBody', #THESE TWO SEEM TO BE THE SAME
'NumberIndependentVotingMembers', 'NbrIndependentVotingMembers', #THESE TWO SEEM TO BE THE SAME
'IndependentAuditFinancialStmt', 'ConsolidatedAuditFinancialStmt', 'FSAudited',
'IndependentAuditFinclStmtInd', 'ConsolidatedAuditFinclStmtInd', 'FSAuditedInd',
'AuditedFinancialStmtAttInd', 'FSAuditedBasis', 'FSAuditedBasisGrp',
'LoanToOfficerOrDQP', 'FundraisingAmt', 'TotalFundrsngExpCurrentYear',
'CYTotalFundraisingExpenseAmt', 'TotalFunctionalExpensesGrp',
'TotalContributions', 'TotalContributionsAmt', 'CYContributionsGrantsAmt',
]
print df[columns[4]].value_counts().sum()
print df[columns[5]].value_counts().sum()
print df[columns[6]].value_counts().sum()
print df[columns[7]].value_counts().sum(), '\n'
print df[columns[8]].value_counts().sum()
print df[columns[9]].value_counts().sum()
print df[columns[10]].value_counts().sum()
print df[columns[11]].value_counts().sum()
194 194 194 194 344 344 344 344
df[columns][2:5]
EIN | FYE | AuditCommittee | AuditCommitteeInd | GoverningBodyVotingMembersCnt | VotingMembersGoverningBodyCnt | IndependentVotingMemberCnt | VotingMembersIndependentCnt | NbrVotingGoverningBodyMembers | NbrVotingMembersGoverningBody | NumberIndependentVotingMembers | NbrIndependentVotingMembers | IndependentAuditFinancialStmt | ConsolidatedAuditFinancialStmt | FSAudited | IndependentAuditFinclStmtInd | ConsolidatedAuditFinclStmtInd | FSAuditedInd | AuditedFinancialStmtAttInd | FSAuditedBasis | FSAuditedBasisGrp | LoanToOfficerOrDQP | FundraisingAmt | TotalFundrsngExpCurrentYear | CYTotalFundraisingExpenseAmt | TotalFunctionalExpensesGrp | TotalContributions | TotalContributionsAmt | CYContributionsGrantsAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 581494135 | FY2014 | NaN | false | 4 | 4 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'#text': u'false', u'@referenceDocumentId': u'IRS990ScheduleD'} | {u'#text': u'false', u'@referenceDocumentId': u'IRS990ScheduleD'} | true | NaN | NaN | {u'SeparateBasisFinclStmtInd': u'X'} | NaN | NaN | NaN | 359301 | {u'TotalAmt': u'1382846', u'ProgramServicesAmt': u'830309', u'ManagementAndGeneralAmt': u'193236', u'FundraisingAmt': u'359301'} | NaN | 1216536 | 1216536 |
3 | 731284606 | FY2012 | 1 | NaN | NaN | NaN | NaN | NaN | 6 | 6 | 4 | 4 | {u'#text': u'1', u'@referenceDocumentId': u'RetDoc1040000001'} | {u'#text': u'0', u'@referenceDocumentId': u'RetDoc1040000001'} | 1 | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 392027 | NaN | NaN | 9299628 | NaN | NaN |
4 | 262224994 | FY2013 | NaN | 1 | 5 | 5 | 4 | 4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'#text': u'1', u'@referenceDocumentId': u'RetDoc1040000001'} | {u'#text': u'0', u'@referenceDocumentId': u'RetDoc1040000001'} | 1 | NaN | NaN | {u'SeparateBasisFinclStmtInd': u'X'} | NaN | NaN | NaN | 17564 | {u'TotalAmt': u'608945', u'ProgramServicesAmt': u'495257', u'ManagementAndGeneralAmt': u'96124', u'FundraisingAmt': u'17564'} | NaN | 359003 | 359003 |
df['audit_committee'] = np.nan
print len(df[df['audit_committee'].notnull()])
df['audit_committee'] = np.where(df['AuditCommittee'].notnull(), df['AuditCommittee'],
df['audit_committee'] )
print len(df[df['audit_committee'].notnull()])
df['audit_committee'] = np.where( ((df['audit_committee'].isnull()) &
(df['AuditCommitteeInd'].notnull())),
df['AuditCommitteeInd'], df['audit_committee'] )
print len(df[df['audit_committee'].notnull()])
df['audit_committee'].value_counts()
0 271 409
1 212 true 144 false 32 0 21 Name: audit_committee, dtype: int64
df['audit_committee'] = np.where( df['audit_committee']=='true', 1, df['audit_committee'] )
df['audit_committee'] = np.where( df['audit_committee']=='1', 1, df['audit_committee'] )
df['audit_committee'] = np.where( df['audit_committee']=='false', 0, df['audit_committee'] )
df['audit_committee'] = np.where( df['audit_committee']=='0', 0, df['audit_committee'] )
print len(df[df['audit_committee'].notnull()])
df['audit_committee'].value_counts()
409
1 356 0 53 Name: audit_committee, dtype: int64
df['board_size'] = np.nan
print len(df[df['board_size'].notnull()])
df['board_size'] = np.where(df['GoverningBodyVotingMembersCnt'].notnull(), df['GoverningBodyVotingMembersCnt'],
df['board_size'] )
print len(df[df['board_size'].notnull()])
df['board_size'] = np.where( ((df['board_size'].isnull()) &
(df['NbrVotingGoverningBodyMembers'].notnull())),
df['NbrVotingGoverningBodyMembers'], df['board_size'] )
print len(df[df['board_size'].notnull()])
df['board_size'].value_counts()
#'GoverningBodyVotingMembersCnt', 'VotingMembersGoverningBodyCnt', #THESE TWO SEEM TO BE THE SAME
#'IndependentVotingMemberCnt', 'VotingMembersIndependentCnt', #THESE TWO SEEM TO BE THE SAME
#'NbrVotingGoverningBodyMembers', 'NbrVotingMembersGoverningBody', #THESE TWO SEEM TO BE THE SAME
#'NumberIndependentVotingMembers', 'NbrIndependentVotingMembers', #THESE TWO SEEM TO BE THE SAME
0 194 538
3 92 4 56 5 44 7 39 6 27 9 24 8 23 11 19 10 18 12 17 17 17 2 16 30 12 13 12 23 12 25 12 15 12 21 10 16 8 22 7 18 6 19 6 27 5 14 5 36 4 24 4 29 3 49 2 60 2 26 2 82 2 44 2 0 2 80 2 31 2 28 2 83 1 59 1 62 1 35 1 34 1 1 1 20 1 33 1 40 1 79 1 Name: board_size, dtype: int64
df['independent_members'] = np.nan
print len(df[df['independent_members'].notnull()])
df['independent_members'] = np.where(df['IndependentVotingMemberCnt'].notnull(), df['IndependentVotingMemberCnt'],
df['independent_members'] )
print len(df[df['independent_members'].notnull()])
df['independent_members'] = np.where( ((df['independent_members'].isnull()) &
(df['NumberIndependentVotingMembers'].notnull())),
df['NumberIndependentVotingMembers'], df['independent_members'] )
print len(df[df['board_size'].notnull()])
df['independent_members'].value_counts()
#'GoverningBodyVotingMembersCnt', 'VotingMembersGoverningBodyCnt', #THESE TWO SEEM TO BE THE SAME
#'IndependentVotingMemberCnt', 'VotingMembersIndependentCnt', #THESE TWO SEEM TO BE THE SAME
#'NbrVotingGoverningBodyMembers', 'NbrVotingMembersGoverningBody', #THESE TWO SEEM TO BE THE SAME
#'NumberIndependentVotingMembers', 'NbrIndependentVotingMembers', #THESE TWO SEEM TO BE THE SAME
0 194 538
0 88 3 48 4 42 7 32 2 30 5 28 9 28 8 23 6 18 11 17 16 16 12 15 25 11 10 10 15 9 23 9 30 8 17 8 19 8 1 8 13 8 24 8 14 7 21 7 18 7 22 6 28 5 20 5 27 4 29 4 36 3 73 2 59 2 44 2 34 2 32 2 72 1 74 1 83 1 76 1 58 1 40 1 31 1 35 1 Name: independent_members, dtype: int64
columns = ['EIN', 'FYE',
#'AuditCommittee', 'AuditCommitteeInd',
'audit_committee',
'board_size', 'independent_members',
#'GoverningBodyVotingMembersCnt', 'VotingMembersGoverningBodyCnt', #THESE TWO SEEM TO BE THE SAME
#'IndependentVotingMemberCnt', 'VotingMembersIndependentCnt', #THESE TWO SEEM TO BE THE SAME
#'NbrVotingGoverningBodyMembers', 'NbrVotingMembersGoverningBody', #THESE TWO SEEM TO BE THE SAME
#'NumberIndependentVotingMembers', 'NbrIndependentVotingMembers', #THESE TWO SEEM TO BE THE SAME
'IndependentAuditFinancialStmt', 'ConsolidatedAuditFinancialStmt', 'FSAudited',
'IndependentAuditFinclStmtInd', 'ConsolidatedAuditFinclStmtInd', 'FSAuditedInd',
'AuditedFinancialStmtAttInd', 'FSAuditedBasis', 'FSAuditedBasisGrp',
'LoanToOfficerOrDQP', 'FundraisingAmt', 'TotalFundrsngExpCurrentYear',
'CYTotalFundraisingExpenseAmt', 'TotalFunctionalExpensesGrp',
'TotalContributions', 'TotalContributionsAmt', 'CYContributionsGrantsAmt',
]
df[columns][2:5]
EIN | FYE | audit_committee | board_size | independent_members | IndependentAuditFinancialStmt | ConsolidatedAuditFinancialStmt | FSAudited | IndependentAuditFinclStmtInd | ConsolidatedAuditFinclStmtInd | FSAuditedInd | AuditedFinancialStmtAttInd | FSAuditedBasis | FSAuditedBasisGrp | LoanToOfficerOrDQP | FundraisingAmt | TotalFundrsngExpCurrentYear | CYTotalFundraisingExpenseAmt | TotalFunctionalExpensesGrp | TotalContributions | TotalContributionsAmt | CYContributionsGrantsAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 581494135 | FY2014 | 0 | 4 | 0 | NaN | NaN | NaN | {u'#text': u'false', u'@referenceDocumentId': u'IRS990ScheduleD'} | {u'#text': u'false', u'@referenceDocumentId': u'IRS990ScheduleD'} | true | NaN | NaN | {u'SeparateBasisFinclStmtInd': u'X'} | NaN | NaN | NaN | 359301 | {u'TotalAmt': u'1382846', u'ProgramServicesAmt': u'830309', u'ManagementAndGeneralAmt': u'193236', u'FundraisingAmt': u'359301'} | NaN | 1216536 | 1216536 |
3 | 731284606 | FY2012 | 1 | 6 | 4 | {u'#text': u'1', u'@referenceDocumentId': u'RetDoc1040000001'} | {u'#text': u'0', u'@referenceDocumentId': u'RetDoc1040000001'} | 1 | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 392027 | NaN | NaN | 9299628 | NaN | NaN |
4 | 262224994 | FY2013 | 1 | 5 | 4 | NaN | NaN | NaN | {u'#text': u'1', u'@referenceDocumentId': u'RetDoc1040000001'} | {u'#text': u'0', u'@referenceDocumentId': u'RetDoc1040000001'} | 1 | NaN | NaN | {u'SeparateBasisFinclStmtInd': u'X'} | NaN | NaN | NaN | 17564 | {u'TotalAmt': u'608945', u'ProgramServicesAmt': u'495257', u'ManagementAndGeneralAmt': u'96124', u'FundraisingAmt': u'17564'} | NaN | 359003 | 359003 |
#print df[columns[5]].value_counts().sum()
#print df[columns[6]].value_counts().sum()
print df[columns[7]].value_counts().sum(), '\n'
#print df[columns[8]].value_counts().sum()
#print df[columns[9]].value_counts().sum()
print df[columns[10]].value_counts().sum()
#print df[columns[11]].value_counts().sum()
344 194
df['audited_financials'] = np.nan
print len(df[df['audited_financials'].notnull()])
df['audited_financials'] = np.where(df['FSAudited'].notnull(), df['FSAudited'],
df['audited_financials'] )
print len(df[df['audited_financials'].notnull()])
df['audited_financials'] = np.where( ((df['audited_financials'].isnull()) &
(df['FSAuditedInd'].notnull())),
df['FSAuditedInd'], df['audited_financials'] )
print len(df[df['audited_financials'].notnull()])
df['audited_financials'].value_counts()
0 344 538
1 232 true 148 false 107 0 51 Name: audited_financials, dtype: int64
df['audited_financials'] = np.where( df['audited_financials']=='true', 1, df['audited_financials'] )
df['audited_financials'] = np.where( df['audited_financials']=='1', 1, df['audited_financials'] )
df['audited_financials'] = np.where( df['audited_financials']=='false', 0, df['audited_financials'] )
df['audited_financials'] = np.where( df['audited_financials']=='0', 0, df['audited_financials'] )
print len(df[df['audited_financials'].notnull()])
df['audited_financials'].value_counts()
538
1 380 0 158 Name: audited_financials, dtype: int64
columns = ['EIN', 'FYE',
#'AuditCommittee', 'AuditCommitteeInd',
'audit_committee',
'board_size', 'independent_members',
#'GoverningBodyVotingMembersCnt', 'VotingMembersGoverningBodyCnt', #THESE TWO SEEM TO BE THE SAME
#'IndependentVotingMemberCnt', 'VotingMembersIndependentCnt', #THESE TWO SEEM TO BE THE SAME
#'NbrVotingGoverningBodyMembers', 'NbrVotingMembersGoverningBody', #THESE TWO SEEM TO BE THE SAME
#'NumberIndependentVotingMembers', 'NbrIndependentVotingMembers', #THESE TWO SEEM TO BE THE SAME
'audited_financials',
#'IndependentAuditFinancialStmt', 'ConsolidatedAuditFinancialStmt', 'FSAudited',
#'IndependentAuditFinclStmtInd', 'ConsolidatedAuditFinclStmtInd', 'FSAuditedInd',
#'AuditedFinancialStmtAttInd', 'FSAuditedBasis', 'FSAuditedBasisGrp',
#'LoanToOfficerOrDQP',
#'FundraisingAmt',
'TotalFundrsngExpCurrentYear', 'CYTotalFundraisingExpenseAmt', 'TotalFunctionalExpensesGrp',
'TotalContributions', 'TotalContributionsAmt', 'CYContributionsGrantsAmt',
]
df[columns][2:5]
EIN | FYE | audit_committee | board_size | independent_members | audited_financials | TotalFundrsngExpCurrentYear | CYTotalFundraisingExpenseAmt | TotalFunctionalExpensesGrp | TotalContributions | TotalContributionsAmt | CYContributionsGrantsAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 581494135 | FY2014 | 0 | 4 | 0 | 1 | NaN | 359301 | {u'TotalAmt': u'1382846', u'ProgramServicesAmt': u'830309', u'ManagementAndGeneralAmt': u'193236', u'FundraisingAmt': u'359301'} | NaN | 1216536 | 1216536 |
3 | 731284606 | FY2012 | 1 | 6 | 4 | 1 | 392027 | NaN | NaN | 9299628 | NaN | NaN |
4 | 262224994 | FY2013 | 1 | 5 | 4 | 1 | NaN | 17564 | {u'TotalAmt': u'608945', u'ProgramServicesAmt': u'495257', u'ManagementAndGeneralAmt': u'96124', u'FundraisingAmt': u'17564'} | NaN | 359003 | 359003 |
print df[columns[6]].value_counts().sum()
print df[columns[7]].value_counts().sum()
print len(df[df[columns[8]].notnull()])
344 194 194
df['fundraising_exp'] = np.nan
print len(df[df['fundraising_exp'].notnull()])
df['fundraising_exp'] = np.where(df['TotalFundrsngExpCurrentYear'].notnull(), df['TotalFundrsngExpCurrentYear'],
df['fundraising_exp'] )
print len(df[df['fundraising_exp'].notnull()])
df['fundraising_exp'] = np.where( ((df['fundraising_exp'].isnull()) &
(df['CYTotalFundraisingExpenseAmt'].notnull())),
df['CYTotalFundraisingExpenseAmt'], df['fundraising_exp'] )
print len(df[df['fundraising_exp'].notnull()])
df['fundraising_exp'].value_counts()
0 344 538
0 238 930602 2 150000 2 12107156 1 1218686 1 250988 1 4426103 1 1182027 1 48180 1 28125 1 7735 1 1127595 1 890703 1 1432842 1 11357773 1 234280 1 3588 1 4267087 1 583844 1 57597 1 456843 1 99370 1 53125 1 178386 1 73831 1 1203637 1 277828 1 659270 1 4685328 1 159980 1 ... 10710 1 11730 1 677021 1 36836 1 68477 1 12342 1 8585 1 180000 1 150 1 359301 1 74632 1 289015 1 2925264 1 76621 1 26529 1 1170772 1 132025 1 5417144 1 330771 1 304506 1 86011 1 43224 1 57971 1 17757 1 21434 1 32860 1 401830 1 593922 1 635121 1 110725 1 Name: fundraising_exp, dtype: int64
df['low_quality_reporting'] = np.where(df['fundraising_exp']=='0', 1,0)
df['low_quality_reporting'].value_counts()
0 300 1 238 Name: low_quality_reporting, dtype: int64
columns = ['EIN', 'FYE',
#'AuditCommittee', 'AuditCommitteeInd',
'audit_committee',
'board_size', 'independent_members',
#'GoverningBodyVotingMembersCnt', 'VotingMembersGoverningBodyCnt', #THESE TWO SEEM TO BE THE SAME
#'IndependentVotingMemberCnt', 'VotingMembersIndependentCnt', #THESE TWO SEEM TO BE THE SAME
#'NbrVotingGoverningBodyMembers', 'NbrVotingMembersGoverningBody', #THESE TWO SEEM TO BE THE SAME
#'NumberIndependentVotingMembers', 'NbrIndependentVotingMembers', #THESE TWO SEEM TO BE THE SAME
'audited_financials',
#'IndependentAuditFinancialStmt', 'ConsolidatedAuditFinancialStmt', 'FSAudited',
#'IndependentAuditFinclStmtInd', 'ConsolidatedAuditFinclStmtInd', 'FSAuditedInd',
#'AuditedFinancialStmtAttInd', 'FSAuditedBasis', 'FSAuditedBasisGrp',
#'LoanToOfficerOrDQP',
#'FundraisingAmt',
#'TotalFundrsngExpCurrentYear', 'CYTotalFundraisingExpenseAmt', 'TotalFunctionalExpensesGrp',
'fundraising_exp', 'low_quality_reporting',
'TotalContributions', 'TotalContributionsAmt', 'CYContributionsGrantsAmt',
]
df[columns][2:5]
EIN | FYE | audit_committee | board_size | independent_members | audited_financials | fundraising_exp | low_quality_reporting | TotalContributions | TotalContributionsAmt | CYContributionsGrantsAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|
2 | 581494135 | FY2014 | 0 | 4 | 0 | 1 | 359301 | 0 | NaN | 1216536 | 1216536 |
3 | 731284606 | FY2012 | 1 | 6 | 4 | 1 | 392027 | 0 | 9299628 | NaN | NaN |
4 | 262224994 | FY2013 | 1 | 5 | 4 | 1 | 17564 | 0 | NaN | 359003 | 359003 |
df[columns][5:25]
EIN | FYE | audit_committee | board_size | independent_members | audited_financials | fundraising_exp | low_quality_reporting | TotalContributions | TotalContributionsAmt | CYContributionsGrantsAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|
5 | 141631995 | FY2012 | 1 | 3 | 0 | 1 | 592976 | 0 | 3328598 | NaN | NaN |
6 | 581925867 | FY2015 | 0 | 9 | 9 | 1 | 26070 | 0 | NaN | 480315 | 480315 |
7 | 431196717 | FY2015 | 0 | 7 | 7 | 1 | 0 | 1 | NaN | 425909 | 425909 |
8 | 131624000 | FY2012 | 1 | 36 | 36 | 1 | 267236 | 0 | 97671159 | NaN | NaN |
9 | 390806314 | FY2012 | 1 | 35 | 34 | 1 | 919698 | 0 | 4041706 | NaN | NaN |
10 | 591561180 | FY2011 | 1 | 59 | 58 | 1 | 4685328 | 0 | 15298518 | NaN | NaN |
11 | 741152597 | FY2010 | NaN | 49 | 32 | 0 | 0 | 1 | 18821365 | NaN | NaN |
12 | 042129889 | FY2013 | 1 | 27 | 24 | 1 | 442668 | 0 | 23922926 | NaN | NaN |
13 | 261636099 | FY2013 | 1 | 19 | 18 | 1 | 677021 | 0 | NaN | 15557166 | 15557166 |
14 | 300038297 | FY2010 | 1 | 10 | 8 | 1 | 257510 | 0 | 5704220 | NaN | NaN |
15 | 390806314 | FY2010 | 1 | 33 | 32 | 1 | 1003859 | 0 | 6320334 | NaN | NaN |
16 | 042129889 | FY2011 | 1 | 25 | 25 | 1 | 890550 | 0 | 25889793 | NaN | NaN |
17 | 300038297 | FY2011 | 1 | 11 | 9 | 1 | 258199 | 0 | 6828824 | NaN | NaN |
18 | 521007153 | FY2011 | 1 | 22 | 22 | 1 | 0 | 1 | NaN | NaN | NaN |
19 | 941347046 | FY2011 | 1 | 22 | 22 | 1 | 2913100 | 0 | 13093432 | NaN | NaN |
20 | 251892177 | FY2011 | 1 | 25 | 24 | 1 | 456843 | 0 | 3850700 | NaN | NaN |
21 | 134147912 | FY2012 | 1 | 7 | 7 | 1 | 78691 | 0 | 2578781 | NaN | NaN |
22 | 132738818 | FY2012 | 1 | 30 | 30 | 1 | 1170772 | 0 | 24067624 | NaN | NaN |
23 | 205183267 | FY2014 | 1 | 11 | 10 | 1 | 47503 | 0 | NaN | 36042992 | 36042992 |
24 | 912053374 | FY2015 | NaN | 23 | 0 | 0 | 38616 | 0 | NaN | 197727 | 197727 |
df[columns].to_pickle('additional robustness variables for advisory orgs from e-file data.pkl')
df[columns].tail()
EIN | FYE | audit_committee | board_size | independent_members | audited_financials | fundraising_exp | low_quality_reporting | TotalContributions | TotalContributionsAmt | CYContributionsGrantsAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|
533 | 570789483 | FY2015 | 1 | 16 | 16 | 1 | 321334 | 0 | NaN | 6521066 | 6521066 |
534 | 362169124 | FY2015 | 1 | 17 | 16 | 1 | 1650849 | 0 | NaN | 5195645 | 5195645 |
535 | 630932057 | FY2014 | 1 | 9 | 9 | 1 | 0 | 1 | NaN | 5098539 | 5098539 |
536 | 941347046 | FY2014 | NaN | 22 | 19 | 0 | 3146137 | 0 | NaN | 11717525 | 11717525 |
537 | 562439708 | FY2014 | 1 | 7 | 7 | 1 | 0 | 1 | NaN | 3548461 | 3548461 |
df[columns].describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
low_quality_reporting | 538.0 | 0.442379 | 0.497131 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
print columns
['EIN', 'FYE', 'audit_committee', 'board_size', 'independent_members', 'audited_financials', 'fundraising_exp', 'low_quality_reporting', 'TotalContributions', 'TotalContributionsAmt', 'CYContributionsGrantsAmt']
print len(df[df['TotalContributions'].notnull()])
print len(df[df['TotalContributionsAmt'].notnull()])
print len(df[df['CYContributionsGrantsAmt'].notnull()])
323 176 194
176+194
370
columns = ['EIN', 'FYE', 'audit_committee', 'board_size', 'independent_members', 'audited_financials',
'fundraising_exp', 'low_quality_reporting']
# 'TotalContributions', 'TotalContributionsAmt', 'CYContributionsGrantsAmt']
df[columns][:5]
EIN | FYE | audit_committee | board_size | independent_members | audited_financials | fundraising_exp | low_quality_reporting | |
---|---|---|---|---|---|---|---|---|
0 | 271317072 | FY2014 | NaN | 2 | 2 | 0 | 0 | 1 |
1 | 953246624 | FY2014 | 1 | 9 | 9 | 1 | 0 | 1 |
2 | 581494135 | FY2014 | 0 | 4 | 0 | 1 | 359301 | 0 |
3 | 731284606 | FY2012 | 1 | 6 | 4 | 1 | 392027 | 0 |
4 | 262224994 | FY2013 | 1 | 5 | 4 | 1 | 17564 | 0 |
df[columns].dtypes
EIN object FYE object audit_committee object board_size object independent_members object audited_financials object fundraising_exp object low_quality_reporting int64 dtype: object
print len(df[df[columns[0]].isnull()])
print len(df[df[columns[1]].isnull()])
print len(df[df[columns[2]].isnull()])
print len(df[df[columns[3]].isnull()])
print len(df[df[columns[4]].isnull()])
print len(df[df[columns[5]].isnull()])
print len(df[df[columns[6]].isnull()])
print len(df[df[columns[7]].isnull()])
0 0 129 0 0 0 0 0
EIN | FYE | audit_committee | board_size | independent_members | audited_financials | fundraising_exp | low_quality_reporting | |
---|---|---|---|---|---|---|---|---|
0 | 271317072 | FY2014 | NaN | 2 | 2 | 0 | 0 | 1 |
11 | 741152597 | FY2010 | NaN | 49 | 32 | 0 | 0 | 1 |
df = df[columns]
df = df.sort_values(by=['EIN', 'FYE'], ascending=[1,0])
df[:5]
EIN | FYE | audit_committee | board_size | independent_members | audited_financials | fundraising_exp | low_quality_reporting | |
---|---|---|---|---|---|---|---|---|
252 | 030498214 | FY2012 | NaN | 6 | 6 | 0 | 140183 | 0 |
433 | 030498214 | FY2011 | 0 | 6 | 6 | 0 | 237568 | 0 |
78 | 030498214 | FY2010 | 0 | 2 | 2 | 0 | 73334 | 0 |
333 | 042129889 | FY2014 | 1 | 23 | 19 | 1 | 0 | 1 |
12 | 042129889 | FY2013 | 1 | 27 | 24 | 1 | 442668 | 0 |
print len(df.groupby('EIN').agg('first'))
df_advisories = df.groupby('EIN').agg('first')
print len(df_advisories)
df_advisories = df_advisories.reset_index()
df_advisories[:5]
152 152
EIN | FYE | audit_committee | board_size | independent_members | audited_financials | fundraising_exp | low_quality_reporting | |
---|---|---|---|---|---|---|---|---|
0 | 030498214 | FY2012 | 0.0 | 6 | 6 | 0 | 140183 | 0 |
1 | 042129889 | FY2014 | 1.0 | 23 | 19 | 1 | 0 | 1 |
2 | 042701694 | FY2014 | 1.0 | 19 | 19 | 1 | 26101 | 0 |
3 | 050516084 | FY2014 | 1.0 | 2 | 1 | 0 | 187725 | 0 |
4 | 061741467 | FY2014 | 0.0 | 10 | 10 | 0 | 12108 | 0 |
pwd
u'/Users/gregorysaxton/SkyDrive/Mining_Firms'
cd '/Users/gregorysaxton/Google Drive/SOX'
/Users/gregorysaxton/Google Drive/SOX
df_advisories.to_pickle('Additional Variables for Robustness Tests - 152 advisory orgs.pkl')
print df_advisories.columns.tolist()
['EIN', 'FYE', 'audit_committee', 'board_size', 'independent_members', 'audited_financials', 'fundraising_exp', 'low_quality_reporting']
from __future__ import division
print 321/8304
0.0386560693642
#columns = ['fund_expense_percent_2016', 'program_expense_growth_2016', 'independent_board_2016',
# 'audited_financials_2016', 'no_loans_related_2016', 'contributions_gifts_grants_2016',
# 'total_contributions_2016', 'fund_expense_percent_2011', 'fund_expense_2011',
# 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'independent_board_2011',
# 'audited_financials_2011', 'no_loans_related_2011', 'pt1_num_vtng_gvrn_bdy_mems', 'pt1_num_ind_vtng_mems',
# 'num_vtng_gvrn_bdy_mems', 'num_ind_vtng_mems', 'contri_grnts_cy', 'tot_fndrsng_exp_cy', 'aud_fincl_stmts',
# 'fs_audited', 'audit_committee']
#print df[columns[17]].value_counts(), '\n'
#df2 = pd.DataFrame(list(filings.find()))
#print '# of columns:', len(df2.columns)
#print '# of observations:', len(df2)
#df2.head(1)
'''
df2[['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt',
'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
'FundraisingActivities', 'FundraisingActivitiesInd', 'FundraisingAmt',
'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
'FundraisingGrossIncomeAmt',
]][:5]
'RelatedOrganizations', 'RelatedOrganizationsAmt',
'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',
'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
'''
"\ndf2[['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt', \n 'MembershipDues', 'MembershipDuesAmt',\n 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',\n 'FundraisingActivities', 'FundraisingActivitiesInd', 'FundraisingAmt', \n 'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents', \n 'FundraisingGrossIncomeAmt',\n ]][:5]\n\n \n 'RelatedOrganizations', 'RelatedOrganizationsAmt',\n 'GovernmentGrants', 'GovernmentGrantsAmt',\n 'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',\n 'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',\n 'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt', \n 'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp', \n 'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',\n 'TotalOtherRevenue', 'CYOtherRevenueAmt'\n ]\n"
'''
df2[[ 'TaxPeriod',
#'CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt',
#'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities',
#'FundraisingActivitiesInd',
'FundraisingAmt',
'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
'FundraisingGrossIncomeAmt',
]][5:20]
'''
"\ndf2[[ 'TaxPeriod',\n #'CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt', \n #'MembershipDues', 'MembershipDuesAmt',\n 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',\n #'FundraisingActivities', \n #'FundraisingActivitiesInd', \n 'FundraisingAmt', \n 'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents', \n 'FundraisingGrossIncomeAmt',\n ]][5:20]\n"
file_list_columns = ['OrganizationName', 'ObjectId', 'URL', 'SubmittedOn', 'DLN', 'LastUpdated', 'TaxPeriod',
'IsElectronic', 'FormType', '_id', 'IsAvailable', 'EIN']
SOX_columns = ['WhistleblowerPolicy', 'WhistleblowerPolicyInd',
'ConflictOfInterestPolicyInd', 'ConflictOfInterestPolicy',
'DocumentRetentionPolicy', 'DocumentRetentionPolicyInd',]
revenue_columns = ['CYTotalRevenueAmt', 'TotalRevenue', 'TotalRevenueCurrentYear']
efficiency_columns = ['CYTotalExpensesAmt', 'TotalExpensesCurrentYear',
'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp',
'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt',]
#df['program_efficiency'] = df['program_expenses']/df['total_expenses']
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt',
'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities', 'FundraisingActivitiesInd',
'FundraisingAmt',
#'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
'FundraisingGrossIncomeAmt',
'RelatedOrganizations', 'RelatedOrganizationsAmt',
'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',
'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
cols = file_list_columns+SOX_columns+revenue_columns+efficiency_columns+complexity_columns
#total_revenue_logged + program_efficiency + age + complexity_2011
df[cols][:2]
OrganizationName | ObjectId | URL | SubmittedOn | DLN | LastUpdated | TaxPeriod | IsElectronic | FormType | _id | IsAvailable | EIN | WhistleblowerPolicy | WhistleblowerPolicyInd | ConflictOfInterestPolicyInd | ConflictOfInterestPolicy | DocumentRetentionPolicy | DocumentRetentionPolicyInd | CYTotalRevenueAmt | TotalRevenue | TotalRevenueCurrentYear | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ANCHOR BUILDING FUND INC | 201503169349304780 | https://s3.amazonaws.com/irs-form-990/201503169349304780_public.xml | 2016-01-30 | 93493316047805 | 2016-03-21T17:23:53 | 201412 | True | 990 | 57cce7823ffc5a55fe7f54b5 | True | 271317072 | NaN | false | false | NaN | NaN | false | 325000 | NaN | NaN | 34400 | NaN | NaN | {u'TotalAmt': u'34400', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'34400', u'FundraisingAmt': u'0'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | 325000 | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000', u'Desc': u'SETTLEMENT'} | NaN | 325000 |
1 | INLAND BEHAVIORAL AND HEALTH SERVICES INC | 201501339349302635 | https://s3.amazonaws.com/irs-form-990/201501339349302635_public.xml | 2015-07-15 | 93493133026355 | 2016-03-21T17:23:53 | 201406 | True | 990 | 57cce87d3ffc5a55fe7f568c | True | 953246624 | NaN | true | true | NaN | NaN | true | 8825153 | NaN | NaN | 9016469 | NaN | NaN | {u'TotalAmt': u'9016469', u'ProgramServicesAmt': u'7477089', u'ManagementAndGeneralAmt': u'1539380', u'FundraisingAmt': u'0'} | NaN | 7477089 | 32937 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | 9457980 | 8758017 | NaN | NaN | [{u'TotalRevenueColumnAmt': u'265216', u'BusinessCd': u'624100', u'RelatedOrExemptFuncIncomeAmt': u'265216', u'Desc': u'Drug/Alcohol Treatment'}, {u'TotalRevenueColumnAmt': u'3987366', u'BusinessCd': u'624100', u'RelatedOrExemptFuncIncomeAmt': u'... | NaN | 8758017 | 107 | NaN | NaN | {u'TotalRevenueColumnAmt': u'107', u'ExclusionAmt': u'107'} | 34092 | [{u'TotalRevenueColumnAmt': u'7299', u'ExclusionAmt': u'7299', u'Desc': u'Gain on A/C ins. claim'}, {u'TotalRevenueColumnAmt': u'26793', u'ExclusionAmt': u'26793', u'Desc': u'Other income'}] | NaN | 34092 |
#file_list_columns = ['OrganizationName', 'ObjectId', 'URL', 'SubmittedOn', 'DLN', 'LastUpdated', 'TaxPeriod',
# 'IsElectronic', 'FormType', '_id', 'IsAvailable', 'EIN']
#cols = file_list_columns + ['MaterialDiversionOrMisuse', 'MaterialDiversionOrMisuseInd']
#df[cols][:1]
OrganizationName | ObjectId | URL | SubmittedOn | DLN | LastUpdated | TaxPeriod | IsElectronic | FormType | _id | IsAvailable | EIN | MaterialDiversionOrMisuse | MaterialDiversionOrMisuseInd | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ANCHOR BUILDING FUND INC | 201503169349304780 | https://s3.amazonaws.com/irs-form-990/201503169349304780_public.xml | 2016-01-30 | 93493316047805 | 2016-03-21T17:23:53 | 201412 | True | 990 | 57cce7823ffc5a55fe7f54b5 | True | 271317072 | NaN | true |
#print df[cols[-1]].value_counts(), '\n'
#print df[cols[-2]].value_counts(), '\n'
0 93 false 89 true 8 1 4 Name: MaterialDiversionOrMisuseInd, dtype: int64 0 185 false 157 1 1 true 1 Name: MaterialDiversionOrMisuse, dtype: int64
152 unique EINs in dataframe.
print len(df['EIN'].tolist())
print len(set(df['EIN'].tolist()))
538 152
All except one are 990s.
df['FormType'].value_counts()
990 537 990EZ 1 Name: FormType, dtype: int64
print len(df)
print df['TaxPeriod'].value_counts().sum(), '\n'
print df['IsAvailable'].value_counts(), '\n'
print df['IsElectronic'].value_counts(), '\n'
538 538 True 538 Name: IsAvailable, dtype: int64 True 538 Name: IsElectronic, dtype: int64
df['FYE'] = 'FY' + df['TaxPeriod'].str[:4]
df['FYE'][:2]
0 FY2014 1 FY2014 Name: FYE, dtype: object
df['FYE'].value_counts()
FY2013 120 FY2014 117 FY2012 112 FY2011 97 FY2010 77 FY2015 15 Name: FYE, dtype: int64
file_list_columns = ['EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod',
#'ObjectId', '_id', 'IsAvailable', 'DLN', 'LastUpdated', 'IsElectronic', 'FormType',
]
SOX_columns = ['WhistleblowerPolicy', 'WhistleblowerPolicyInd',
'ConflictOfInterestPolicyInd', 'ConflictOfInterestPolicy',
'DocumentRetentionPolicy', 'DocumentRetentionPolicyInd',]
revenue_columns = ['CYTotalRevenueAmt', 'TotalRevenue', 'TotalRevenueCurrentYear']
efficiency_columns = ['CYTotalExpensesAmt', 'TotalExpensesCurrentYear',
'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp',
'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt',]
#df['program_efficiency'] = df['program_expenses']/df['total_expenses']
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt',
'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities', 'FundraisingActivitiesInd',
'FundraisingAmt',
#'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
'FundraisingGrossIncomeAmt',
'RelatedOrganizations', 'RelatedOrganizationsAmt',
'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',
'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
cols = file_list_columns+SOX_columns+revenue_columns+efficiency_columns+complexity_columns
#total_revenue_logged + program_efficiency + age + complexity_2011
df[cols][:2]
cols = ['FYE']+file_list_columns+SOX_columns+revenue_columns+efficiency_columns+complexity_columns
#total_revenue_logged + program_efficiency + age + complexity_2011
df[cols][:2]
FYE | EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | WhistleblowerPolicy | WhistleblowerPolicyInd | ConflictOfInterestPolicyInd | ConflictOfInterestPolicy | DocumentRetentionPolicy | DocumentRetentionPolicyInd | CYTotalRevenueAmt | TotalRevenue | TotalRevenueCurrentYear | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | FY2014 | 271317072 | ANCHOR BUILDING FUND INC | https://s3.amazonaws.com/irs-form-990/201503169349304780_public.xml | 2016-01-30 | 201412 | NaN | false | false | NaN | NaN | false | 325000 | NaN | NaN | 34400 | NaN | NaN | {u'TotalAmt': u'34400', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'34400', u'FundraisingAmt': u'0'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | 325000 | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000', u'Desc': u'SETTLEMENT'} | NaN | 325000 |
1 | FY2014 | 953246624 | INLAND BEHAVIORAL AND HEALTH SERVICES INC | https://s3.amazonaws.com/irs-form-990/201501339349302635_public.xml | 2015-07-15 | 201406 | NaN | true | true | NaN | NaN | true | 8825153 | NaN | NaN | 9016469 | NaN | NaN | {u'TotalAmt': u'9016469', u'ProgramServicesAmt': u'7477089', u'ManagementAndGeneralAmt': u'1539380', u'FundraisingAmt': u'0'} | NaN | 7477089 | 32937 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | 9457980 | 8758017 | NaN | NaN | [{u'TotalRevenueColumnAmt': u'265216', u'BusinessCd': u'624100', u'RelatedOrExemptFuncIncomeAmt': u'265216', u'Desc': u'Drug/Alcohol Treatment'}, {u'TotalRevenueColumnAmt': u'3987366', u'BusinessCd': u'624100', u'RelatedOrExemptFuncIncomeAmt': u'... | NaN | 8758017 | 107 | NaN | NaN | {u'TotalRevenueColumnAmt': u'107', u'ExclusionAmt': u'107'} | 34092 | [{u'TotalRevenueColumnAmt': u'7299', u'ExclusionAmt': u'7299', u'Desc': u'Gain on A/C ins. claim'}, {u'TotalRevenueColumnAmt': u'26793', u'ExclusionAmt': u'26793', u'Desc': u'Other income'}] | NaN | 34092 |
#cols = ['FYE'] + file_list_columns + ['MaterialDiversionOrMisuse', 'MaterialDiversionOrMisuseInd']
#df = df.sort_values(by=['EIN', 'FYE'], ascending=[1, 0])[cols]
df = df.sort_values(by=['EIN', 'FYE'], ascending=[1, 0])[cols]
print len(df)
df[:2]
538
FYE | EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | WhistleblowerPolicy | WhistleblowerPolicyInd | ConflictOfInterestPolicyInd | ConflictOfInterestPolicy | DocumentRetentionPolicy | DocumentRetentionPolicyInd | CYTotalRevenueAmt | TotalRevenue | TotalRevenueCurrentYear | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | FY2012 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 201212 | false | NaN | NaN | true | true | NaN | NaN | {u'TotalRevenueColumn': u'2515399', u'ExclusionAmount': u'2063'} | 2515399 | NaN | 2100922 | {u'ManagementAndGeneral': u'362113', u'ProgramServices': u'1598626', u'Fundraising': u'140183', u'Total': u'2100922'} | NaN | 1598626 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | 185408 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'63', u'ExclusionAmount': u'63'} | 63 | NaN | NaN | NaN | 0 | NaN |
433 | FY2011 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 201112 | false | NaN | NaN | true | true | NaN | NaN | {u'TotalRevenueColumn': u'2161209'} | 2161209 | NaN | 2481310 | {u'ManagementAndGeneral': u'527949', u'ProgramServices': u'1715793', u'Fundraising': u'237568', u'Total': u'2481310'} | NaN | 1715793 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
#print len(df)
#df[:2]
538
FYE | OrganizationName | ObjectId | URL | SubmittedOn | DLN | LastUpdated | TaxPeriod | IsElectronic | FormType | _id | IsAvailable | EIN | MaterialDiversionOrMisuse | MaterialDiversionOrMisuseInd | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | FY2012 | NEWARK NOW INC | 201303199349303505 | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 93493319035053 | 2016-03-21T17:23:53 | 201212 | True | 990 | 57cce8673ffc5a55fe7f5664 | True | 030498214 | false | NaN |
433 | FY2011 | NEWARK NOW INC | 201320249349300417 | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 93493024004173 | 2016-03-21T17:23:53 | 201112 | True | 990 | 57cce7d63ffc5a55fe7f554d | True | 030498214 | false | NaN |
print df[revenue_columns[0]].value_counts().sum()
print df[revenue_columns[1]].value_counts().sum()
print df[revenue_columns[2]].value_counts().sum()
194 344 344
df[revenue_columns][:5]
CYTotalRevenueAmt | TotalRevenue | TotalRevenueCurrentYear | |
---|---|---|---|
252 | NaN | {u'TotalRevenueColumn': u'2515399', u'ExclusionAmount': u'2063'} | 2515399 |
433 | NaN | {u'TotalRevenueColumn': u'2161209'} | 2161209 |
78 | NaN | {u'TotalRevenueColumn': u'3990564'} | 3990564 |
333 | 22995526 | NaN | NaN |
12 | NaN | {u'TotalRevenueColumn': u'25923449', u'RelatedOrExemptFunctionIncome': u'451157', u'ExclusionAmount': u'1549366'} | 25923449 |
df['tot_rev'] = np.nan
print len(df[df['tot_rev'].notnull()])
df['tot_rev'] = np.where(df['TotalRevenueCurrentYear'].notnull(), df['TotalRevenueCurrentYear'], df['tot_rev'] )
print len(df[df['tot_rev'].notnull()])
df['tot_rev'] = np.where( ((df['tot_rev'].isnull()) & (df['CYTotalRevenueAmt'].notnull())),
df['CYTotalRevenueAmt'], df['tot_rev'] )
print len(df[df['tot_rev'].notnull()])
df['tot_rev'] = df['tot_rev'].astype('int')
print len(df[df['tot_rev'].notnull()])
0 344 538 538
print df.columns.tolist()
['FYE', u'EIN', u'OrganizationName', u'URL', u'SubmittedOn', u'TaxPeriod', u'WhistleblowerPolicy', u'WhistleblowerPolicyInd', u'ConflictOfInterestPolicyInd', u'ConflictOfInterestPolicy', u'DocumentRetentionPolicy', u'DocumentRetentionPolicyInd', u'CYTotalRevenueAmt', u'TotalRevenue', u'TotalRevenueCurrentYear', u'CYTotalExpensesAmt', u'TotalExpensesCurrentYear', u'TotalFunctionalExpenses', u'TotalFunctionalExpensesGrp', u'TotalProgramServiceExpense', u'TotalProgramServiceExpensesAmt', u'CYContributionsGrantsAmt', u'FederatedCampaigns', u'FederatedCampaignsAmt', u'MembershipDues', u'MembershipDuesAmt', u'NetIncomeFromFundraisingEvents', u'NetIncmFromFundraisingEvtGrp', u'FundraisingAmt', u'FundraisingGrossIncomeAmt', u'RelatedOrganizations', u'RelatedOrganizationsAmt', u'GovernmentGrants', u'GovernmentGrantsAmt', u'PYProgramServiceRevenueAmt', u'CYProgramServiceRevenueAmt', u'ProgramServiceRevenue', u'ProgramServiceRevenueCY', u'ProgramServiceRevenueGrp', u'TotalProgramServiceRevenue', u'TotalProgramServiceRevenueAmt', u'CYInvestmentIncomeAmt', u'InvestmentIncome', u'InvestmentIncomeCurrentYear', u'InvestmentIncomeGrp', u'OtherRevenueTotalAmt', u'OtherRevenueMiscGrp', u'TotalOtherRevenue', u'CYOtherRevenueAmt', 'tot_rev']
new_cols = ['FYE', 'EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod',
#'CYTotalRevenueAmt', 'TotalRevenue', 'TotalRevenueCurrentYear',
'tot_rev',
'WhistleblowerPolicy', 'WhistleblowerPolicyInd',
'ConflictOfInterestPolicyInd', 'ConflictOfInterestPolicy',
'DocumentRetentionPolicy', 'DocumentRetentionPolicyInd',
'CYTotalExpensesAmt', 'TotalExpensesCurrentYear',
'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp',
'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt',
'CYContributionsGrantsAmt',
'FederatedCampaigns', 'FederatedCampaignsAmt',
'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
'FundraisingAmt', 'FundraisingGrossIncomeAmt',
'RelatedOrganizations', 'RelatedOrganizationsAmt',
'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt', 'ProgramServiceRevenue',
'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp', 'TotalProgramServiceRevenue',
'TotalProgramServiceRevenueAmt',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp', 'TotalOtherRevenue', 'CYOtherRevenueAmt']
df[new_cols][:5]
FYE | EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | tot_rev | WhistleblowerPolicy | WhistleblowerPolicyInd | ConflictOfInterestPolicyInd | ConflictOfInterestPolicy | DocumentRetentionPolicy | DocumentRetentionPolicyInd | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | FY2012 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 201212 | 2515399 | false | NaN | NaN | true | true | NaN | NaN | 2100922 | {u'ManagementAndGeneral': u'362113', u'ProgramServices': u'1598626', u'Fundraising': u'140183', u'Total': u'2100922'} | NaN | 1598626 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | 185408 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'63', u'ExclusionAmount': u'63'} | 63 | NaN | NaN | NaN | 0 | NaN |
433 | FY2011 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 201112 | 2161209 | false | NaN | NaN | true | true | NaN | NaN | 2481310 | {u'ManagementAndGeneral': u'527949', u'ProgramServices': u'1715793', u'Fundraising': u'237568', u'Total': u'2481310'} | NaN | 1715793 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
78 | FY2010 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201220909349300327_public.xml | 2012-05-25 | 201012 | 3990564 | false | NaN | NaN | true | true | NaN | NaN | 3912643 | {u'ManagementAndGeneral': u'908524', u'ProgramServices': u'2930785', u'Fundraising': u'73334', u'Total': u'3912643'} | NaN | 2930785 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
333 | FY2014 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201502299349301700_public.xml | 2015-09-21 | 201409 | 22995526 | NaN | true | true | NaN | NaN | true | 25626262 | NaN | NaN | {u'TotalAmt': u'25626262', u'ProgramServicesAmt': u'18485740', u'ManagementAndGeneralAmt': u'7140522', u'FundraisingAmt': u'0'} | NaN | 18485740 | 20437991 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'0'} | NaN | NaN | NaN | 2053228 | NaN | 16124016 | 0 | 0 | NaN | NaN | NaN | NaN | 0 | 1539746 | NaN | NaN | {u'TotalRevenueColumnAmt': u'362651', u'ExclusionAmt': u'362651'} | 428192 | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | FY2013 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201442269349304504_public.xml | 2014-10-28 | 201309 | 25923449 | true | NaN | NaN | true | true | NaN | NaN | 26224173 | {u'ManagementAndGeneral': u'6501012', u'ProgramServices': u'19280493', u'Fundraising': u'442668', u'Total': u'26224173'} | NaN | 19280493 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | 2466016 | NaN | 16660756 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'398381', u'ExclusionAmount': u'398381'} | 1195606 | NaN | NaN | NaN | 451157 | NaN |
df.describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
tot_rev | 538.0 | 5.857063e+07 | 3.234413e+08 | -218265025.0 | 1320586.25 | 4261393.5 | 13341627.0 | 3.741635e+09 |
print df['WhistleblowerPolicy'].value_counts().sum()
print df['WhistleblowerPolicyInd'].value_counts().sum(), '\n'
print df['ConflictOfInterestPolicy'].value_counts().sum()
print df['ConflictOfInterestPolicyInd'].value_counts().sum(), '\n'
print df['DocumentRetentionPolicy'].value_counts().sum()
print df['DocumentRetentionPolicyInd'].value_counts().sum(), '\n'
344 194 344 194 344 194
NOTE: THIS VARIABLE JUST ADDED IN AD HOC -- IT WASN'T SAVED IN INITIAL ROUND.
df['material_diversion'] = np.nan
print len(df[df['material_diversion'].notnull()])
df['material_diversion'] = np.where(df['MaterialDiversionOrMisuse'].notnull(), df['MaterialDiversionOrMisuse'],
df['material_diversion'] )
print len(df[df['material_diversion'].notnull()])
df['material_diversion'] = np.where( ((df['material_diversion'].isnull()) &
(df['MaterialDiversionOrMisuseInd'].notnull())),
df['MaterialDiversionOrMisuseInd'], df['material_diversion'] )
print len(df[df['material_diversion'].notnull()])
df['material_diversion'].value_counts()
0 344 538
0 278 false 246 true 9 1 5 Name: material_diversion, dtype: int64
df['material_diversion'] = np.where( df['material_diversion']=='true', 1, df['material_diversion'] )
df['material_diversion'] = np.where( df['material_diversion']=='1', 1, df['material_diversion'] )
df['material_diversion'] = np.where( df['material_diversion']=='false', 0, df['material_diversion'] )
df['material_diversion'] = np.where( df['material_diversion']=='0', 0, df['material_diversion'] )
print len(df[df['material_diversion'].notnull()])
df['material_diversion'].value_counts()
538
0 524 1 14 Name: material_diversion, dtype: int64
df['whistleblower_policy'] = np.nan
print len(df[df['whistleblower_policy'].notnull()])
df['whistleblower_policy'] = np.where(df['WhistleblowerPolicy'].notnull(), df['WhistleblowerPolicy'],
df['whistleblower_policy'] )
print len(df[df['whistleblower_policy'].notnull()])
df['whistleblower_policy'] = np.where( ((df['whistleblower_policy'].isnull()) & (df['WhistleblowerPolicyInd'].notnull())),
df['WhistleblowerPolicyInd'], df['whistleblower_policy'] )
print len(df[df['whistleblower_policy'].notnull()])
df['whistleblower_policy'].value_counts()
0 344 538
1 179 true 138 false 117 0 104 Name: whistleblower_policy, dtype: int64
df['whistleblower_policy'] = np.where( df['whistleblower_policy']=='true', 1, df['whistleblower_policy'] )
df['whistleblower_policy'] = np.where( df['whistleblower_policy']=='1', 1, df['whistleblower_policy'] )
df['whistleblower_policy'] = np.where( df['whistleblower_policy']=='false', 0, df['whistleblower_policy'] )
df['whistleblower_policy'] = np.where( df['whistleblower_policy']=='0', 0, df['whistleblower_policy'] )
print len(df[df['whistleblower_policy'].notnull()])
df['whistleblower_policy'].value_counts()
538
1 317 0 221 Name: whistleblower_policy, dtype: int64
df[['whistleblower_policy', 'WhistleblowerPolicyInd', 'WhistleblowerPolicy' ]][:5]
whistleblower_policy | WhistleblowerPolicyInd | WhistleblowerPolicy | |
---|---|---|---|
252 | 0 | NaN | false |
433 | 0 | NaN | false |
78 | 0 | NaN | false |
333 | 1 | true | NaN |
12 | 1 | NaN | true |
df['conflict_of_interest_policy'] = np.nan
print len(df[df['conflict_of_interest_policy'].notnull()])
df['conflict_of_interest_policy'] = np.where(df['ConflictOfInterestPolicy'].notnull(), df['ConflictOfInterestPolicy'],
df['conflict_of_interest_policy'] )
print len(df[df['conflict_of_interest_policy'].notnull()])
df['conflict_of_interest_policy'] = np.where( ((df['conflict_of_interest_policy'].isnull())
& (df['ConflictOfInterestPolicyInd'].notnull())),
df['ConflictOfInterestPolicyInd'], df['conflict_of_interest_policy'] )
print len(df[df['conflict_of_interest_policy'].notnull()])
df['conflict_of_interest_policy'].value_counts()
0 344 538
1 262 true 192 false 63 0 21 Name: conflict_of_interest_policy, dtype: int64
df['conflict_of_interest_policy'] = np.where( df['conflict_of_interest_policy']=='true', 1, df['conflict_of_interest_policy'] )
df['conflict_of_interest_policy'] = np.where( df['conflict_of_interest_policy']=='1', 1, df['conflict_of_interest_policy'] )
df['conflict_of_interest_policy'] = np.where( df['conflict_of_interest_policy']=='false', 0, df['conflict_of_interest_policy'] )
df['conflict_of_interest_policy'] = np.where( df['conflict_of_interest_policy']=='0', 0, df['conflict_of_interest_policy'] )
print len(df[df['conflict_of_interest_policy'].notnull()])
df['conflict_of_interest_policy'].value_counts()
538
1 454 0 84 Name: conflict_of_interest_policy, dtype: int64
df[['conflict_of_interest_policy', 'ConflictOfInterestPolicyInd', 'ConflictOfInterestPolicy' ]][:5]
conflict_of_interest_policy | ConflictOfInterestPolicyInd | ConflictOfInterestPolicy | |
---|---|---|---|
252 | 1 | NaN | true |
433 | 1 | NaN | true |
78 | 1 | NaN | true |
333 | 1 | true | NaN |
12 | 1 | NaN | true |
df['records_retention_policy'] = np.nan
print len(df[df['records_retention_policy'].notnull()])
df['records_retention_policy'] = np.where(df['DocumentRetentionPolicy'].notnull(), df['DocumentRetentionPolicy'],
df['records_retention_policy'] )
print len(df[df['records_retention_policy'].notnull()])
df['records_retention_policy'] = np.where( ((df['records_retention_policy'].isnull())
& (df['DocumentRetentionPolicyInd'].notnull())),
df['DocumentRetentionPolicyInd'], df['records_retention_policy'] )
print len(df[df['records_retention_policy'].notnull()])
df['records_retention_policy'].value_counts()
0 344 538
1 224 true 159 false 96 0 59 Name: records_retention_policy, dtype: int64
df['records_retention_policy'] = np.where( df['records_retention_policy']=='true', 1, df['records_retention_policy'] )
df['records_retention_policy'] = np.where( df['records_retention_policy']=='1', 1, df['records_retention_policy'] )
df['records_retention_policy'] = np.where( df['records_retention_policy']=='false', 0, df['records_retention_policy'] )
df['records_retention_policy'] = np.where( df['records_retention_policy']=='0', 0, df['records_retention_policy'] )
print len(df[df['records_retention_policy'].notnull()])
df['records_retention_policy'].value_counts()
538
1 383 0 155 Name: records_retention_policy, dtype: int64
df[['records_retention_policy', 'DocumentRetentionPolicyInd', 'DocumentRetentionPolicy' ]][:5]
records_retention_policy | DocumentRetentionPolicyInd | DocumentRetentionPolicy | |
---|---|---|---|
252 | 1 | NaN | true |
433 | 1 | NaN | true |
78 | 1 | NaN | true |
333 | 1 | true | NaN |
12 | 1 | NaN | true |
print df.columns.tolist()
['FYE', u'EIN', u'OrganizationName', u'URL', u'SubmittedOn', u'TaxPeriod', u'WhistleblowerPolicy', u'WhistleblowerPolicyInd', u'ConflictOfInterestPolicyInd', u'ConflictOfInterestPolicy', u'DocumentRetentionPolicy', u'DocumentRetentionPolicyInd', u'CYTotalRevenueAmt', u'TotalRevenue', u'TotalRevenueCurrentYear', u'CYTotalExpensesAmt', u'TotalExpensesCurrentYear', u'TotalFunctionalExpenses', u'TotalFunctionalExpensesGrp', u'TotalProgramServiceExpense', u'TotalProgramServiceExpensesAmt', u'CYContributionsGrantsAmt', u'FederatedCampaigns', u'FederatedCampaignsAmt', u'MembershipDues', u'MembershipDuesAmt', u'NetIncomeFromFundraisingEvents', u'NetIncmFromFundraisingEvtGrp', u'FundraisingAmt', u'FundraisingGrossIncomeAmt', u'RelatedOrganizations', u'RelatedOrganizationsAmt', u'GovernmentGrants', u'GovernmentGrantsAmt', u'PYProgramServiceRevenueAmt', u'CYProgramServiceRevenueAmt', u'ProgramServiceRevenue', u'ProgramServiceRevenueCY', u'ProgramServiceRevenueGrp', u'TotalProgramServiceRevenue', u'TotalProgramServiceRevenueAmt', u'CYInvestmentIncomeAmt', u'InvestmentIncome', u'InvestmentIncomeCurrentYear', u'InvestmentIncomeGrp', u'OtherRevenueTotalAmt', u'OtherRevenueMiscGrp', u'TotalOtherRevenue', u'CYOtherRevenueAmt', 'tot_rev', 'whistleblower_policy', 'conflict_of_interest_policy', 'records_retention_policy']
new_cols = ['FYE', 'EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod',
'tot_rev', 'whistleblower_policy', 'conflict_of_interest_policy', 'records_retention_policy',
#'WhistleblowerPolicy', u'WhistleblowerPolicyInd', u'ConflictOfInterestPolicyInd',
#'ConflictOfInterestPolicy', u'DocumentRetentionPolicy', u'DocumentRetentionPolicyInd',
#'CYTotalRevenueAmt', 'TotalRevenue', 'TotalRevenueCurrentYear',
'CYTotalExpensesAmt',
'TotalExpensesCurrentYear', 'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp',
'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt', 'CYContributionsGrantsAmt',
'FederatedCampaigns', 'FederatedCampaignsAmt', u'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
'FundraisingAmt', 'FundraisingGrossIncomeAmt',
'RelatedOrganizations',
'RelatedOrganizationsAmt', 'GovernmentGrants', 'GovernmentGrantsAmt', 'PYProgramServiceRevenueAmt',
'CYProgramServiceRevenueAmt', 'ProgramServiceRevenue', 'ProgramServiceRevenueCY',
'ProgramServiceRevenueGrp', 'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp', 'TotalOtherRevenue', 'CYOtherRevenueAmt',
]
df = df[new_cols]
df[:5]
FYE | EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | tot_rev | whistleblower_policy | conflict_of_interest_policy | records_retention_policy | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | FY2012 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 201212 | 2515399 | 0 | 1 | 1 | NaN | 2100922 | {u'ManagementAndGeneral': u'362113', u'ProgramServices': u'1598626', u'Fundraising': u'140183', u'Total': u'2100922'} | NaN | 1598626 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | 185408 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'63', u'ExclusionAmount': u'63'} | 63 | NaN | NaN | NaN | 0 | NaN |
433 | FY2011 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 201112 | 2161209 | 0 | 1 | 1 | NaN | 2481310 | {u'ManagementAndGeneral': u'527949', u'ProgramServices': u'1715793', u'Fundraising': u'237568', u'Total': u'2481310'} | NaN | 1715793 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
78 | FY2010 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201220909349300327_public.xml | 2012-05-25 | 201012 | 3990564 | 0 | 1 | 1 | NaN | 3912643 | {u'ManagementAndGeneral': u'908524', u'ProgramServices': u'2930785', u'Fundraising': u'73334', u'Total': u'3912643'} | NaN | 2930785 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
333 | FY2014 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201502299349301700_public.xml | 2015-09-21 | 201409 | 22995526 | 1 | 1 | 1 | 25626262 | NaN | NaN | {u'TotalAmt': u'25626262', u'ProgramServicesAmt': u'18485740', u'ManagementAndGeneralAmt': u'7140522', u'FundraisingAmt': u'0'} | NaN | 18485740 | 20437991 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'0'} | NaN | NaN | NaN | 2053228 | NaN | 16124016 | 0 | 0 | NaN | NaN | NaN | NaN | 0 | 1539746 | NaN | NaN | {u'TotalRevenueColumnAmt': u'362651', u'ExclusionAmt': u'362651'} | 428192 | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | FY2013 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201442269349304504_public.xml | 2014-10-28 | 201309 | 25923449 | 1 | 1 | 1 | NaN | 26224173 | {u'ManagementAndGeneral': u'6501012', u'ProgramServices': u'19280493', u'Fundraising': u'442668', u'Total': u'26224173'} | NaN | 19280493 | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | 2466016 | NaN | 16660756 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'398381', u'ExclusionAmount': u'398381'} | 1195606 | NaN | NaN | NaN | 451157 | NaN |
152 unique EINs in this dataset -- roughly half of the 321 '2016' donor advisories.
df.to_pickle('538 e-file 990s for 2016 donor advisory organizations, v2.pkl')
#cols = ['FYE'] + file_list_columns + ['MaterialDiversionOrMisuse', 'MaterialDiversionOrMisuseInd'] + ['material_diversion']
#df[cols].to_pickle('538 e-file 990s for 2016 donor advisory organizations (material diversions only).pkl')
#df['material_diversion'] = df['material_diversion'].astype('int')
#df[cols].describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
material_diversion | 538.0 | 0.026022 | 0.15935 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
#cols = ['FYE'] + file_list_columns + ['MaterialDiversionOrMisuse', 'MaterialDiversionOrMisuseInd'] + ['material_diversion']
#df[cols].to_pickle('538 e-file 990s for 2016 donor advisory organizations (material diversions only).pkl')
efficiency_columns = ['CYTotalExpensesAmt', 'TotalExpensesCurrentYear',
'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp',
'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt',]
df[efficiency_columns][:4]
CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | |
---|---|---|---|---|---|---|
252 | NaN | 2100922 | {u'ManagementAndGeneral': u'362113', u'ProgramServices': u'1598626', u'Fundraising': u'140183', u'Total': u'2100922'} | NaN | 1598626 | NaN |
433 | NaN | 2481310 | {u'ManagementAndGeneral': u'527949', u'ProgramServices': u'1715793', u'Fundraising': u'237568', u'Total': u'2481310'} | NaN | 1715793 | NaN |
78 | NaN | 3912643 | {u'ManagementAndGeneral': u'908524', u'ProgramServices': u'2930785', u'Fundraising': u'73334', u'Total': u'3912643'} | NaN | 2930785 | NaN |
333 | 25626262 | NaN | NaN | {u'TotalAmt': u'25626262', u'ProgramServicesAmt': u'18485740', u'ManagementAndGeneralAmt': u'7140522', u'FundraisingAmt': u'0'} | NaN | 18485740 |
df['total_expenses'] = np.nan
print len(df[df['total_expenses'].notnull()])
df['total_expenses'] = np.where(df['CYTotalExpensesAmt'].notnull(), df['CYTotalExpensesAmt'],
df['total_expenses'] )
print len(df[df['total_expenses'].notnull()])
df['total_expenses'] = np.where( ((df['total_expenses'].isnull()) & (df['TotalExpensesCurrentYear'].notnull())),
df['TotalExpensesCurrentYear'], df['total_expenses'] )
print len(df[df['total_expenses'].notnull()])
df['total_expenses'].describe().T
0 194 538
count 538 unique 536 top 271781 freq 2 Name: total_expenses, dtype: object
df['program_expenses'] = np.nan
print len(df[df['program_expenses'].notnull()])
df['program_expenses'] = np.where(df['TotalProgramServiceExpense'].notnull(), df['TotalProgramServiceExpense'],
df['program_expenses'] )
print len(df[df['program_expenses'].notnull()])
df['program_expenses'] = np.where( ((df['program_expenses'].isnull()) & (df['TotalProgramServiceExpensesAmt'].notnull())),
df['TotalProgramServiceExpensesAmt'], df['program_expenses'] )
print len(df[df['program_expenses'].notnull()])
df['program_expenses'].describe().T
0 336 524
count 524 unique 522 top 31689530 freq 2 Name: program_expenses, dtype: object
print len(df[df['program_expenses'].isnull()])
df[df['program_expenses'].isnull()]
14
FYE | EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | tot_rev | whistleblower_policy | conflict_of_interest_policy | records_retention_policy | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | total_expenses | program_expenses | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
172 | FY2014 | 132644507 | WORLD TRADE CENTERS ASSOCIATION INC | https://s3.amazonaws.com/irs-form-990/201543149349303504_public.xml | 2016-02-08 | 201412 | 7913104 | 1 | 1 | 1 | 7763846 | NaN | NaN | {u'TotalAmt': u'7763846'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 8834499 | 6812087 | NaN | NaN | [{u'TotalRevenueColumnAmt': u'3242920', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'3242920', u'Desc': u'MEMBERSHIP DUES'}, {u'TotalRevenueColumnAmt': u'2890000', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'28900... | NaN | 6812087 | 1075687 | NaN | NaN | {u'TotalRevenueColumnAmt': u'334806', u'ExclusionAmt': u'334806'} | 25330 | {u'TotalRevenueColumnAmt': u'25330', u'BusinessCd': u'900099', u'ExclusionAmt': u'25330', u'Desc': u'MISCELLANEOUS'} | NaN | 25330 | 7763846 | NaN |
404 | FY2013 | 132644507 | WORLD TRADE CENTERS ASSOCIATION INC | https://s3.amazonaws.com/irs-form-990/201433149349302383_public.xml | 2014-12-10 | 201312 | 9228395 | 1 | 1 | 1 | 7795021 | NaN | NaN | {u'TotalAmt': u'7795021'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7520245 | 8834499 | NaN | NaN | [{u'TotalRevenueColumnAmt': u'4774000', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'4774000', u'Desc': u'INITIATION FEES'}, {u'TotalRevenueColumnAmt': u'3347499', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'33474... | NaN | 8834499 | 391315 | NaN | NaN | {u'TotalRevenueColumnAmt': u'275100', u'ExclusionAmt': u'275100'} | 2581 | {u'TotalRevenueColumnAmt': u'2581', u'BusinessCd': u'900099', u'ExclusionAmt': u'2581', u'Desc': u'MISCELLANEOUS'} | NaN | 2581 | 7795021 | NaN |
431 | FY2012 | 132644507 | WORLD TRADE CENTERS ASSOCIATION INC | https://s3.amazonaws.com/irs-form-990/201313169349303646_public.xml | 2013-12-24 | 201212 | 7749706 | 1 | 1 | 1 | NaN | 6237509 | {u'Total': u'6237509'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [{u'BusinessCode': u'900099', u'TotalRevenueColumn': u'3333245', u'RelatedOrExemptFunctionIncome': u'3333245', u'Description': u'MEMBERSHIP DUES'}, {u'BusinessCode': u'900099', u'TotalRevenueColumn': u'3278000', u'RelatedOrExemptFunctionIncome': ... | 7520245 | NaN | 7520245 | NaN | NaN | {u'TotalRevenueColumn': u'234964', u'ExclusionAmount': u'234964'} | 211074 | NaN | NaN | NaN | 18387 | NaN | 6237509 | NaN |
106 | FY2011 | 132644507 | WORLD TRADE CENTERS ASSOCIATION INC | https://s3.amazonaws.com/irs-form-990/201222279349301852_public.xml | 2012-11-15 | 201112 | 7072954 | 1 | 1 | 1 | NaN | 5880789 | {u'Total': u'5880789'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [{u'BusinessCode': u'900099', u'TotalRevenueColumn': u'3313485', u'RelatedOrExemptFunctionIncome': u'3313485', u'Description': u'MEMBERSHIP DUES'}, {u'BusinessCode': u'900099', u'TotalRevenueColumn': u'3183000', u'RelatedOrExemptFunctionIncome': ... | 6926485 | NaN | 6926485 | NaN | NaN | {u'TotalRevenueColumn': u'237277', u'ExclusionAmount': u'237277'} | 144021 | NaN | NaN | NaN | 2448 | NaN | 5880789 | NaN |
72 | FY2010 | 132644507 | WORLD TRADE CENTERS ASSOCIATION INC | https://s3.amazonaws.com/irs-form-990/201123199349305807_public.xml | 2012-01-03 | 201012 | 6858206 | 1 | 1 | 1 | NaN | 6024086 | {u'Total': u'6024086'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [{u'BusinessCode': u'900099', u'TotalRevenueColumn': u'3452000', u'RelatedOrExemptFunctionIncome': u'3452000', u'Description': u'INITIATION FEES'}, {u'BusinessCode': u'900099', u'TotalRevenueColumn': u'3134807', u'RelatedOrExemptFunctionIncome': ... | 6586807 | NaN | 6586807 | NaN | NaN | {u'TotalRevenueColumn': u'217696', u'ExclusionAmount': u'217696'} | 192714 | NaN | NaN | NaN | 78685 | NaN | 6024086 | NaN |
52 | FY2014 | 135579302 | UNITED HEBREW CEMETERY INC | https://s3.amazonaws.com/irs-form-990/201531949349300508_public.xml | 2015-07-29 | 201412 | 4223826 | 1 | 1 | 1 | 2023750 | NaN | NaN | {u'TotalAmt': u'2023750'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2572305 | 2466379 | NaN | NaN | [{u'TotalRevenueColumnAmt': u'1235180', u'RelatedOrExemptFuncIncomeAmt': u'1235180', u'Desc': u'SALES OF GRAVES & PLOTS'}, {u'TotalRevenueColumnAmt': u'601571', u'RelatedOrExemptFuncIncomeAmt': u'601571', u'Desc': u'INTERMENTS'}, {u'TotalRevenueC... | NaN | 2466379 | 1299578 | NaN | NaN | {u'TotalRevenueColumnAmt': u'481123', u'RelatedOrExemptFuncIncomeAmt': u'481123'} | 457869 | {u'TotalRevenueColumnAmt': u'457869', u'RelatedOrExemptFuncIncomeAmt': u'457869', u'Desc': u'PERPETUAL CARE ENDOWMENTS'} | NaN | 457869 | 2023750 | NaN |
406 | FY2013 | 135579302 | UNITED HEBREW CEMETERY INC | https://s3.amazonaws.com/irs-form-990/201433039349300503_public.xml | 2014-11-24 | 201312 | 4936178 | 0 | 0 | 0 | 4133761 | NaN | NaN | {u'TotalAmt': u'4133761', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'0', u'FundraisingAmt': u'0'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | 2196546 | 2572305 | NaN | NaN | [{u'TotalRevenueColumnAmt': u'43547', u'RelatedOrExemptFuncIncomeAmt': u'43547', u'Desc': u'CELL TOWER INCOME'}, {u'TotalRevenueColumnAmt': u'371441', u'RelatedOrExemptFuncIncomeAmt': u'371441', u'Desc': u'FOUNDATIONS'}, {u'TotalRevenueColumnAmt'... | NaN | 2572305 | 917828 | NaN | NaN | {u'TotalRevenueColumnAmt': u'584869', u'RelatedOrExemptFuncIncomeAmt': u'584869'} | 1446045 | [{u'TotalRevenueColumnAmt': u'1000000', u'RelatedOrExemptFuncIncomeAmt': u'1000000', u'Desc': u'LEGAL RECOVERY SETTLEMENT'}, {u'TotalRevenueColumnAmt': u'446045', u'RelatedOrExemptFuncIncomeAmt': u'446045', u'Desc': u'PERPETUAL CARE ENDOWMENTS'}] | NaN | 1446045 | 4133761 | NaN |
500 | FY2010 | 201737542 | GREATER CORNERSTONE COMMUNITY DEV | https://s3.amazonaws.com/irs-form-990/201111369349303781_public.xml | 2011-06-08 | 201006 | 548759 | 1 | 1 | 1 | NaN | 177901 | {u'ManagementAndGeneral': u'111025', u'ProgramServices': u'0', u'Fundraising': u'66876', u'Total': u'177901'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 177901 | NaN |
0 | FY2014 | 271317072 | ANCHOR BUILDING FUND INC | https://s3.amazonaws.com/irs-form-990/201503169349304780_public.xml | 2016-01-30 | 201412 | 325000 | 0 | 0 | 0 | 34400 | NaN | NaN | {u'TotalAmt': u'34400', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'34400', u'FundraisingAmt': u'0'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | 325000 | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000', u'Desc': u'SETTLEMENT'} | NaN | 325000 | 34400 | NaN |
277 | FY2014 | 330837780 | INTERNATIONAL CHRISTIAN BROADCASTING INC | https://s3.amazonaws.com/irs-form-990/201543209349314539_public.xml | 2016-02-22 | 201412 | 9926 | 1 | 1 | 1 | 3330 | NaN | NaN | {u'TotalAmt': u'3330', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'3330', u'FundraisingAmt': u'0'} | NaN | NaN | 9225 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 9025 | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 701 | NaN | NaN | {u'TotalRevenueColumnAmt': u'701', u'ExclusionAmt': u'701'} | NaN | NaN | NaN | 0 | 3330 | NaN |
226 | FY2011 | 591951577 | ALLIED VETERANS OF THE WORLD INC | https://s3.amazonaws.com/irs-form-990/201243209349307874_public.xml | 2012-12-31 | 201112 | 1736929 | 1 | 1 | 1 | NaN | 1690623 | {u'Total': u'1690623'} | NaN | NaN | NaN | NaN | NaN | NaN | 26929 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'1710000'} | 1710000 | NaN | NaN | NaN | NaN | NaN | 1690623 | NaN |
213 | FY2010 | 591951577 | ALLIED VETERANS OF THE WORLD INC | https://s3.amazonaws.com/irs-form-990/201132279349301923_public.xml | 2011-09-07 | 201012 | 2496382 | 1 | 1 | 1 | NaN | 1161939 | {u'Total': u'1161939'} | NaN | NaN | NaN | NaN | NaN | NaN | 8056 | NaN | NaN | NaN | NaN | NaN | 2482845 | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | 5481 | NaN | 1161939 | NaN |
495 | FY2010 | 591951577 | ALLIED VETERANS OF THE WORLD INC | https://s3.amazonaws.com/irs-form-990/201243219349300739_public.xml | 2012-12-31 | 201012 | 2496382 | 1 | 1 | 1 | NaN | 1282149 | {u'Total': u'1282149'} | NaN | NaN | NaN | NaN | NaN | NaN | 8056 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'2482845'} | 2482845 | NaN | NaN | NaN | 5481 | NaN | 1282149 | NaN |
496 | FY2011 | 953553530 | NATIONAL MINORITY TV INC | https://s3.amazonaws.com/irs-form-990/201243219349300444_public.xml | 2012-12-31 | 201112 | 0 | 1 | 1 | 1 | NaN | 0 | {u'ManagementAndGeneral': u'0', u'ProgramServices': u'0', u'Fundraising': u'0', u'Total': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
df[df['program_expenses'].isnull()].to_csv('14 filings with zero program expenses.csv')
df['program_expenses'] = df['program_expenses'].astype('float')
df['total_expenses'] = df['total_expenses'].astype('float')
df['program_efficiency'] = df['program_expenses']/df['total_expenses']
print df['program_efficiency'].describe()
count 524.000000 mean 0.779227 std 0.191070 min 0.023817 25% NaN 50% NaN 75% NaN max 1.000000 Name: program_efficiency, dtype: float64
df[complexity_columns][:5]
CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | 185408 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'63', u'ExclusionAmount': u'63'} | 63 | NaN | NaN | NaN | 0 | NaN |
433 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
78 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
333 | 20437991 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'0'} | NaN | NaN | NaN | 2053228 | NaN | 16124016 | 0 | 0 | NaN | NaN | NaN | NaN | 0 | 1539746 | NaN | NaN | {u'TotalRevenueColumnAmt': u'362651', u'ExclusionAmt': u'362651'} | 428192 | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | 2466016 | NaN | 16660756 | NaN | NaN | NaN | NaN | 0 | NaN | 0 | NaN | NaN | {u'TotalRevenueColumn': u'398381', u'ExclusionAmount': u'398381'} | 1195606 | NaN | NaN | NaN | 451157 | NaN |
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt',
'MembershipDues', 'MembershipDuesAmt',
'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities', 'FundraisingActivitiesInd',
'FundraisingAmt',
#'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
'FundraisingGrossIncomeAmt',
'RelatedOrganizations', 'RelatedOrganizationsAmt',
'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',
'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
print len(df[df['FederatedCampaigns'].notnull()])
df['FederatedCampaigns'] = np.where( ((df['FederatedCampaigns'].isnull()) & (df['FederatedCampaignsAmt'].notnull())),
df['FederatedCampaignsAmt'], df['FederatedCampaigns'] )
print len(df[df['FederatedCampaigns'].notnull()])
df['FederatedCampaigns'].describe().T
21 40
count 40 unique 34 top 0 freq 6 Name: FederatedCampaigns, dtype: object
df[complexity_columns][20:25]
CYContributionsGrantsAmt | FederatedCampaigns | FederatedCampaignsAmt | MembershipDues | MembershipDuesAmt | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | RelatedOrganizations | RelatedOrganizationsAmt | GovernmentGrants | GovernmentGrantsAmt | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
101 | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'BusinessCode': u'611600', u'TotalRevenueColumn': u'194006', u'RelatedOrExemptFunctionIncome': u'194006', u'Description': u'TUITION'} | 194006 | NaN | 194006 | NaN | NaN | {u'TotalRevenueColumn': u'7', u'RelatedOrExemptFunctionIncome': u'7'} | 7 | NaN | NaN | NaN | 0 | NaN |
528 | 12294982 | NaN | NaN | NaN | NaN | NaN | NaN | 1817567 | NaN | NaN | NaN | NaN | 646173 | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 1549 | NaN | NaN | {u'TotalRevenueColumnAmt': u'1549'} | NaN | NaN | NaN | 0 |
401 | 12741996 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 629939 | NaN | 0 | NaN | NaN | NaN | NaN | NaN | 1656 | NaN | NaN | {u'TotalRevenueColumnAmt': u'1656'} | NaN | NaN | NaN | 0 |
361 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 437339 | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'2695'} | 2695 | NaN | NaN | NaN | NaN | NaN |
36 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 461226 | NaN | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | {u'TotalRevenueColumn': u'2317'} | 2317 | NaN | NaN | NaN | NaN | NaN |
print len(df[df['MembershipDues'].notnull()])
print len(df[df['MembershipDuesAmt'].notnull()])
df['MembershipDues'] = np.where( ((df['MembershipDues'].isnull()) & (df['MembershipDuesAmt'].notnull())),
df['MembershipDuesAmt'], df['MembershipDues'] )
print len(df[df['MembershipDues'].notnull()])
df['MembershipDues'].describe().T
32 12 44
count 44 unique 37 top 0 freq 7 Name: MembershipDues, dtype: object
print complexity_columns
['CYContributionsGrantsAmt', 'FederatedCampaigns', 'FederatedCampaignsAmt', 'MembershipDues', 'MembershipDuesAmt', 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp', 'FundraisingAmt', 'FundraisingGrossIncomeAmt', 'RelatedOrganizations', 'RelatedOrganizationsAmt', 'GovernmentGrants', 'GovernmentGrantsAmt', 'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt', 'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp', 'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt', 'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp', 'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp', 'TotalOtherRevenue', 'CYOtherRevenueAmt']
import ast
df[['NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp', 'FundraisingAmt',
'FundraisingGrossIncomeAmt']][4:12]
NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | |
---|---|---|---|---|
12 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
481 | {u'TotalRevenueColumn': u'221449', u'ExclusionAmount': u'221449'} | NaN | NaN | NaN |
16 | {u'TotalRevenueColumn': u'23265', u'ExclusionAmount': u'23265'} | NaN | NaN | NaN |
130 | {u'TotalRevenueColumn': u'1322', u'RelatedOrExemptFunctionIncome': u'1322', u'UnrelatedBusinessRevenue': u'0', u'ExclusionAmount': u'0'} | NaN | NaN | NaN |
434 | NaN | {u'TotalRevenueColumnAmt': u'9429', u'ExclusionAmt': u'9429'} | 2985 | 28106 |
374 | {u'TotalRevenueColumn': u'2923', u'ExclusionAmount': u'2923'} | NaN | NaN | NaN |
118 | {u'TotalRevenueColumn': u'47210', u'ExclusionAmount': u'47210'} | NaN | NaN | NaN |
124 | {u'TotalRevenueColumn': u'99073', u'RelatedOrExemptFunctionIncome': u'99073'} | NaN | NaN | NaN |
print len(df[df['NetIncomeFromFundraisingEvents'].notnull()])
print len(df[df['NetIncmFromFundraisingEvtGrp'].notnull()])
df['NetIncomeFromFundraisingEvents'] = np.where( ((df['NetIncomeFromFundraisingEvents'].isnull())
& (df['NetIncmFromFundraisingEvtGrp'].notnull())),
df['NetIncmFromFundraisingEvtGrp'], df['NetIncomeFromFundraisingEvents'] )
print len(df[df['NetIncomeFromFundraisingEvents'].notnull()])
df['NetIncomeFromFundraisingEvents'].describe().T
144 84 228
count 228 unique 129 top {u'TotalRevenueColumn': u'0'} freq 66 Name: NetIncomeFromFundraisingEvents, dtype: object
df[['NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp', 'FundraisingAmt',
'FundraisingGrossIncomeAmt']][4:12]
NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | |
---|---|---|---|---|
12 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
481 | {u'TotalRevenueColumn': u'221449', u'ExclusionAmount': u'221449'} | NaN | NaN | NaN |
16 | {u'TotalRevenueColumn': u'23265', u'ExclusionAmount': u'23265'} | NaN | NaN | NaN |
130 | {u'TotalRevenueColumn': u'1322', u'RelatedOrExemptFunctionIncome': u'1322', u'UnrelatedBusinessRevenue': u'0', u'ExclusionAmount': u'0'} | NaN | NaN | NaN |
434 | {u'TotalRevenueColumnAmt': u'9429', u'ExclusionAmt': u'9429'} | {u'TotalRevenueColumnAmt': u'9429', u'ExclusionAmt': u'9429'} | 2985 | 28106 |
374 | {u'TotalRevenueColumn': u'2923', u'ExclusionAmount': u'2923'} | NaN | NaN | NaN |
118 | {u'TotalRevenueColumn': u'47210', u'ExclusionAmount': u'47210'} | NaN | NaN | NaN |
124 | {u'TotalRevenueColumn': u'99073', u'RelatedOrExemptFunctionIncome': u'99073'} | NaN | NaN | NaN |
counter = 1
df['fundraising'] = np.nan
for index, row in df.iterrows():
#print row['NetIncomeFromFundraisingEvents']
#print type(ast.literal_eval(str(row['NetIncomeFromFundraisingEvents'])))
if pd.notnull(row['NetIncomeFromFundraisingEvents']):
try:
fr_revs = ast.literal_eval(str(row['NetIncomeFromFundraisingEvents']))
except:
df.ix[index, 'fundraising'] = np.nan
if 'TotalRevenueColumn' in fr_revs:
#print 'yes', fr_revs['TotalRevenueColumn']
df.ix[index, 'fundraising'] = fr_revs['TotalRevenueColumn']
elif 'TotalRevenueColumnAmt' in fr_revs:
#print 'yes', fr_revs['TotalRevenueColumnAmt']
df.ix[index, 'fundraising'] = fr_revs['TotalRevenueColumnAmt']
else:
df.ix[index, 'fundraising'] = np.nan
#counter +=1
#print counter
'''
counter = 1
df['fundraising'] = np.nan
for index, row in df.iterrows():
#print row['NetIncomeFromFundraisingEvents']
#print type(ast.literal_eval(str(row['NetIncomeFromFundraisingEvents'])))
if pd.notnull(row['NetIncomeFromFundraisingEvents']):
try:
fr_revs = ast.literal_eval(str(row['NetIncomeFromFundraisingEvents']))
except:
df[index, 'fundraising'] = np.nan
if 'TotalRevenueColumn' in fr_revs:
#print 'yes', fr_revs['TotalRevenueColumn']
df.ix[index, 'fundraising'] = fr_revs['TotalRevenueColumn']
elif 'TotalRevenueColumnAmt' in fr_revs:
#print 'yes', fr_revs['TotalRevenueColumnAmt']
df.ix[index, 'fundraising'] = fr_revs['TotalRevenueColumnAmt']
else:
df[index, 'fundraising'] = np.nan
#counter +=1
#print counter
'''
df[['fundraising', 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp', 'FundraisingAmt',
'FundraisingGrossIncomeAmt']][4:8]
fundraising | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | |
---|---|---|---|---|---|
12 | 0 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
481 | 221449 | {u'TotalRevenueColumn': u'221449', u'ExclusionAmount': u'221449'} | NaN | NaN | NaN |
16 | 23265 | {u'TotalRevenueColumn': u'23265', u'ExclusionAmount': u'23265'} | NaN | NaN | NaN |
130 | 1322 | {u'TotalRevenueColumn': u'1322', u'RelatedOrExemptFunctionIncome': u'1322', u'UnrelatedBusinessRevenue': u'0', u'ExclusionAmount': u'0'} | NaN | NaN | NaN |
df[['fundraising', 'FundraisingAmt', 'FundraisingGrossIncomeAmt']][:12]
fundraising | FundraisingAmt | FundraisingGrossIncomeAmt | |
---|---|---|---|
252 | 0 | NaN | NaN |
433 | 0 | NaN | NaN |
78 | 0 | NaN | NaN |
333 | 0 | NaN | NaN |
12 | 0 | NaN | NaN |
481 | 221449 | NaN | NaN |
16 | 23265 | NaN | NaN |
130 | 1322 | NaN | NaN |
434 | 9429 | 2985 | 28106 |
374 | 2923 | NaN | NaN |
118 | 47210 | NaN | NaN |
124 | 99073 | NaN | NaN |
print len(df[df['fundraising'].notnull()])
print len(df[df['FundraisingAmt'].notnull()])
print len(df[df['FundraisingGrossIncomeAmt'].notnull()])
df['fundraising'] = np.where( ((df['fundraising']=='0') & (df['FundraisingAmt'].notnull())),
df['FundraisingAmt'], df['fundraising'] )
df['fundraising'] = np.where( ((df['fundraising']=='0') & (df['FundraisingGrossIncomeAmt'].notnull())),
df['FundraisingGrossIncomeAmt'], df['fundraising'] )
print len(df[df['fundraising'].notnull()])
df[['fundraising', 'FundraisingAmt', 'FundraisingGrossIncomeAmt']][:12]
228 32 53 228
fundraising | FundraisingAmt | FundraisingGrossIncomeAmt | |
---|---|---|---|
252 | 0 | NaN | NaN |
433 | 0 | NaN | NaN |
78 | 0 | NaN | NaN |
333 | 0 | NaN | NaN |
12 | 0 | NaN | NaN |
481 | 221449 | NaN | NaN |
16 | 23265 | NaN | NaN |
130 | 1322 | NaN | NaN |
434 | 9429 | 2985 | 28106 |
374 | 2923 | NaN | NaN |
118 | 47210 | NaN | NaN |
124 | 99073 | NaN | NaN |
df['fundraising'].describe().T
count 228 unique 132 top 0 freq 96 Name: fundraising, dtype: object
#df[df['fundraising']=='-38198'][['EIN', 'FYE', 'fundraising', 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
# 'FundraisingAmt', 'FundraisingGrossIncomeAmt']]
df[['EIN', 'FYE', 'fundraising', 'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
'FundraisingAmt', 'FundraisingGrossIncomeAmt']][50:74]
EIN | FYE | fundraising | NetIncomeFromFundraisingEvents | NetIncmFromFundraisingEvtGrp | FundraisingAmt | FundraisingGrossIncomeAmt | |
---|---|---|---|---|---|---|---|
106 | 132644507 | FY2011 | NaN | NaN | NaN | NaN | NaN |
72 | 132644507 | FY2010 | NaN | NaN | NaN | NaN | NaN |
336 | 132690403 | FY2013 | NaN | NaN | NaN | NaN | NaN |
430 | 132690403 | FY2012 | NaN | NaN | NaN | NaN | NaN |
473 | 132690403 | FY2011 | NaN | NaN | NaN | NaN | NaN |
33 | 132690403 | FY2010 | NaN | NaN | NaN | NaN | NaN |
531 | 132738818 | FY2014 | -18373 | {u'TotalRevenueColumnAmt': u'-18373', u'ExclusionAmt': u'-18373'} | {u'TotalRevenueColumnAmt': u'-18373', u'ExclusionAmt': u'-18373'} | 217434 | 32760 |
162 | 132738818 | FY2013 | 0 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
22 | 132738818 | FY2012 | 0 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
82 | 132738818 | FY2011 | NaN | NaN | NaN | NaN | NaN |
140 | 132738818 | FY2010 | 0 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
442 | 133119118 | FY2012 | 21748 | {u'TotalRevenueColumn': u'21748', u'ExclusionAmount': u'21748'} | NaN | NaN | NaN |
491 | 133119118 | FY2011 | 176541 | {u'TotalRevenueColumn': u'176541', u'ExclusionAmount': u'176541'} | NaN | NaN | NaN |
499 | 133119118 | FY2010 | 236763 | {u'TotalRevenueColumn': u'236763', u'ExclusionAmount': u'236763'} | NaN | NaN | NaN |
70 | 133552154 | FY2011 | 853901 | {u'TotalRevenueColumn': u'853901', u'ExclusionAmount': u'853901'} | NaN | NaN | NaN |
123 | 133552154 | FY2010 | 418298 | {u'TotalRevenueColumn': u'418298', u'ExclusionAmount': u'418298'} | NaN | NaN | NaN |
296 | 133600807 | FY2014 | NaN | NaN | NaN | NaN | NaN |
407 | 133600807 | FY2013 | NaN | NaN | NaN | NaN | NaN |
428 | 133600807 | FY2012 | NaN | NaN | NaN | NaN | NaN |
180 | 133600807 | FY2010 | NaN | NaN | NaN | NaN | NaN |
323 | 134147912 | FY2014 | NaN | NaN | NaN | NaN | NaN |
384 | 134147912 | FY2013 | NaN | NaN | NaN | NaN | NaN |
21 | 134147912 | FY2012 | 0 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
246 | 134147912 | FY2011 | 0 | {u'TotalRevenueColumn': u'0'} | NaN | NaN | NaN |
df['fundraising'] = df['fundraising'].astype('float')
print df['fundraising'].dtype, '\n'
print df['fundraising'].describe()
float64 count 2.280000e+02 mean 8.138974e+04 std 5.064406e+05 min -2.839000e+05 25% NaN 50% NaN 75% NaN max 7.283318e+06 Name: fundraising, dtype: float64
print len(df[df['RelatedOrganizations'].notnull()])
print len(df[df['RelatedOrganizationsAmt'].notnull()])
df['RelatedOrganizations'] = np.where( ((df['RelatedOrganizations'].isnull())
& (df['RelatedOrganizationsAmt'].notnull())),
df['RelatedOrganizationsAmt'], df['RelatedOrganizations'] )
print len(df[df['RelatedOrganizations'].notnull()])
df['RelatedOrganizations'].describe().T
33 19 52
count 52 unique 46 top 0 freq 7 Name: RelatedOrganizations, dtype: object
print len(df[df['GovernmentGrants'].notnull()])
print len(df[df['GovernmentGrantsAmt'].notnull()])
df['GovernmentGrants'] = np.where( ((df['GovernmentGrants'].isnull()) & (df['GovernmentGrantsAmt'].notnull())),
df['GovernmentGrantsAmt'], df['GovernmentGrants'] )
print len(df[df['GovernmentGrants'].notnull()])
df['GovernmentGrants'].describe().T
213 70 213
count 213 unique 207 top 548987 freq 3 Name: GovernmentGrants, dtype: object
print len(df[df['PYProgramServiceRevenueAmt'].notnull()])
print len(df[df['CYProgramServiceRevenueAmt'].notnull()])
print len(df[df['ProgramServiceRevenueCY'].notnull()])
print len(df[df['TotalProgramServiceRevenue'].notnull()])
print len(df[df['TotalProgramServiceRevenueAmt'].notnull()]), '\n'
df['program_revenue'] = np.nan
print len(df[df['program_revenue'].notnull()])
df['program_revenue'] = np.where( df['PYProgramServiceRevenueAmt'].notnull(),
df['PYProgramServiceRevenueAmt'], df['program_revenue'] )
print len(df[df['program_revenue'].notnull()])
df['program_revenue'] = np.where( ((df['program_revenue'].isnull()) & (df['CYProgramServiceRevenueAmt'].notnull())),
df['CYProgramServiceRevenueAmt'], df['program_revenue'] )
print len(df[df['program_revenue'].notnull()])
df['program_revenue'] = np.where( ((df['program_revenue'].isnull()) & (df['ProgramServiceRevenueCY'].notnull())),
df['ProgramServiceRevenueCY'], df['program_revenue'] )
print len(df[df['program_revenue'].notnull()])
df['program_revenue'] = np.where( ((df['program_revenue'].isnull()) & (df['TotalProgramServiceRevenue'].notnull())),
df['TotalProgramServiceRevenue'], df['program_revenue'] )
print len(df[df['program_revenue'].notnull()])
df['program_revenue'] = np.where( ((df['program_revenue'].isnull())
& (df['TotalProgramServiceRevenueAmt'].notnull())),
df['TotalProgramServiceRevenueAmt'], df['program_revenue'] )
print len(df[df['program_revenue'].notnull()])
df['program_revenue'].describe().T
151 194 344 233 130 0 151 194 538 538 538
count 538 unique 259 top 0 freq 217 Name: program_revenue, dtype: object
df[['program_revenue', 'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt']][:20]
program_revenue | PYProgramServiceRevenueAmt | CYProgramServiceRevenueAmt | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | |
---|---|---|---|---|---|---|---|---|
252 | 0 | NaN | NaN | NaN | 0 | NaN | 0 | NaN |
433 | 0 | NaN | NaN | NaN | 0 | NaN | 0 | NaN |
78 | 0 | NaN | NaN | NaN | 0 | NaN | 0 | NaN |
333 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | 0 |
12 | 0 | NaN | NaN | NaN | 0 | NaN | 0 | NaN |
481 | 0 | NaN | NaN | NaN | 0 | NaN | 0 | NaN |
16 | 0 | NaN | NaN | NaN | 0 | NaN | NaN | NaN |
130 | 0 | NaN | NaN | NaN | 0 | NaN | 0 | NaN |
434 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | NaN |
374 | 0 | NaN | NaN | NaN | 0 | NaN | NaN | NaN |
118 | 0 | NaN | NaN | NaN | 0 | NaN | NaN | NaN |
124 | 0 | NaN | NaN | NaN | 0 | NaN | NaN | NaN |
263 | 76449 | 76449 | 0 | NaN | NaN | NaN | NaN | NaN |
196 | 76449 | NaN | NaN | {u'TotalRevenueColumn': u'76449', u'RelatedOrExemptFunctionIncome': u'76449', u'Description': u'RENTAL INCOME'} | 76449 | NaN | 76449 | NaN |
256 | 205125 | NaN | NaN | {u'TotalRevenueColumn': u'205125', u'RelatedOrExemptFunctionIncome': u'205125', u'Description': u'RENTAL INCOME'} | 205125 | NaN | 205125 | NaN |
233 | 256109 | NaN | NaN | [{u'TotalRevenueColumn': u'174799', u'RelatedOrExemptFunctionIncome': u'174799', u'Description': u'RENTAL INCOME'}, {u'TotalRevenueColumn': u'81310', u'RelatedOrExemptFunctionIncome': u'81310', u'Description': u'THRIFT STORE REVENUE'}] | 256109 | NaN | 256109 | NaN |
129 | 107536 | NaN | NaN | NaN | 107536 | NaN | 107536 | NaN |
319 | 224006 | 224006 | 249554 | NaN | NaN | {u'TotalRevenueColumnAmt': u'249554', u'BusinessCd': u'611600', u'RelatedOrExemptFuncIncomeAmt': u'249554', u'Desc': u'TUITION'} | NaN | 249554 |
394 | 206873 | 206873 | 224006 | NaN | NaN | {u'TotalRevenueColumnAmt': u'224006', u'BusinessCd': u'611600', u'RelatedOrExemptFuncIncomeAmt': u'224006', u'Desc': u'TUITION'} | NaN | 224006 |
462 | 206873 | NaN | NaN | {u'BusinessCode': u'611600', u'TotalRevenueColumn': u'206873', u'RelatedOrExemptFunctionIncome': u'206873', u'Description': u'TUITION'} | 206873 | NaN | 206873 | NaN |
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', #'FederatedCampaignsAmt',
'MembershipDues', #'MembershipDuesAmt',
#'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities', 'FundraisingActivitiesInd',
#'FundraisingAmt',
#'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
#'FundraisingGrossIncomeAmt',
'fundraising',
'RelatedOrganizations', #'RelatedOrganizationsAmt',
'GovernmentGrants', #'GovernmentGrantsAmt',
#'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
#'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
#'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'program_revenue',
'CYInvestmentIncomeAmt', 'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',
'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
df[complexity_columns][:10]
CYContributionsGrantsAmt | FederatedCampaigns | MembershipDues | fundraising | RelatedOrganizations | GovernmentGrants | program_revenue | CYInvestmentIncomeAmt | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | NaN | NaN | NaN | 0.0 | NaN | 185408 | 0 | NaN | {u'TotalRevenueColumn': u'63', u'ExclusionAmount': u'63'} | 63 | NaN | NaN | NaN | 0 | NaN |
433 | NaN | NaN | NaN | 0.0 | NaN | NaN | 0 | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
78 | NaN | NaN | NaN | 0.0 | NaN | NaN | 0 | NaN | {u'TotalRevenueColumn': u'0'} | 0 | NaN | NaN | NaN | 0 | NaN |
333 | 20437991 | NaN | NaN | 0.0 | 2053228 | 16124016 | 0 | 1539746 | NaN | NaN | {u'TotalRevenueColumnAmt': u'362651', u'ExclusionAmt': u'362651'} | 428192 | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | NaN | NaN | NaN | 0.0 | 2466016 | 16660756 | 0 | NaN | {u'TotalRevenueColumn': u'398381', u'ExclusionAmount': u'398381'} | 1195606 | NaN | NaN | NaN | 451157 | NaN |
481 | NaN | NaN | NaN | 221449.0 | 954610 | 21380666 | 0 | NaN | {u'TotalRevenueColumn': u'494644', u'ExclusionAmount': u'494644'} | 1249229 | NaN | NaN | NaN | 458249 | NaN |
16 | NaN | NaN | NaN | 23265.0 | NaN | 19362978 | 0 | NaN | {u'TotalRevenueColumn': u'646166', u'ExclusionAmount': u'646166'} | 703373 | NaN | NaN | NaN | 58199 | NaN |
130 | NaN | 0 | 0 | 1322.0 | 0 | 20439097 | 0 | NaN | {u'TotalRevenueColumn': u'761762', u'RelatedOrExemptFunctionIncome': u'0', u'UnrelatedBusinessRevenue': u'0', u'ExclusionAmount': u'761762'} | 1195892 | NaN | NaN | NaN | 91122 | NaN |
434 | 556113 | NaN | NaN | 9429.0 | NaN | NaN | 0 | 132915 | NaN | NaN | {u'TotalRevenueColumnAmt': u'132915', u'ExclusionAmt': u'132915'} | 15142 | {u'TotalRevenueColumnAmt': u'15142', u'BusinessCd': u'900099', u'ExclusionAmt': u'15142', u'Desc': u'OTHER INCOME'} | NaN | 75409 |
374 | NaN | NaN | NaN | 2923.0 | NaN | NaN | 0 | NaN | {u'TotalRevenueColumn': u'141660', u'ExclusionAmount': u'141660'} | 141660 | NaN | NaN | NaN | 14151 | NaN |
df[['CYInvestmentIncomeAmt', 'InvestmentIncomeCurrentYear']][10:16]
CYInvestmentIncomeAmt | InvestmentIncomeCurrentYear | |
---|---|---|
118 | NaN | 182952 |
124 | NaN | 92923 |
263 | -141807 | NaN |
196 | NaN | 0 |
256 | NaN | 0 |
233 | NaN | -52721 |
print len(df[df['CYInvestmentIncomeAmt'].notnull()])
print len(df[df['InvestmentIncomeCurrentYear'].notnull()])
df['CYInvestmentIncomeAmt'] = np.where( ((df['CYInvestmentIncomeAmt'].isnull())
& (df['InvestmentIncomeCurrentYear'].notnull())),
df['InvestmentIncomeCurrentYear'], df['CYInvestmentIncomeAmt'] )
print len(df[df['CYInvestmentIncomeAmt'].notnull()])
df['CYInvestmentIncomeAmt'].describe().T
194 344 538
count 538 unique 397 top 0 freq 123 Name: CYInvestmentIncomeAmt, dtype: object
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', #'FederatedCampaignsAmt',
'MembershipDues', #'MembershipDuesAmt',
#'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities', 'FundraisingActivitiesInd',
#'FundraisingAmt',
#'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
#'FundraisingGrossIncomeAmt',
'fundraising',
'RelatedOrganizations', #'RelatedOrganizationsAmt',
'GovernmentGrants', #'GovernmentGrantsAmt',
#'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
#'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
#'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'program_revenue',
'CYInvestmentIncomeAmt',
#'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', 'OtherRevenueMiscGrp',
'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
df[complexity_columns][:10]
CYContributionsGrantsAmt | FederatedCampaigns | MembershipDues | fundraising | RelatedOrganizations | GovernmentGrants | program_revenue | CYInvestmentIncomeAmt | OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | NaN | NaN | NaN | 0.0 | NaN | 185408 | 0 | 63 | NaN | NaN | 0 | NaN |
433 | NaN | NaN | NaN | 0.0 | NaN | NaN | 0 | 0 | NaN | NaN | 0 | NaN |
78 | NaN | NaN | NaN | 0.0 | NaN | NaN | 0 | 0 | NaN | NaN | 0 | NaN |
333 | 20437991 | NaN | NaN | 0.0 | 2053228 | 16124016 | 0 | 1539746 | 428192 | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | NaN | NaN | NaN | 0.0 | 2466016 | 16660756 | 0 | 1195606 | NaN | NaN | 451157 | NaN |
481 | NaN | NaN | NaN | 221449.0 | 954610 | 21380666 | 0 | 1249229 | NaN | NaN | 458249 | NaN |
16 | NaN | NaN | NaN | 23265.0 | NaN | 19362978 | 0 | 703373 | NaN | NaN | 58199 | NaN |
130 | NaN | 0 | 0 | 1322.0 | 0 | 20439097 | 0 | 1195892 | NaN | NaN | 91122 | NaN |
434 | 556113 | NaN | NaN | 9429.0 | NaN | NaN | 0 | 132915 | 15142 | {u'TotalRevenueColumnAmt': u'15142', u'BusinessCd': u'900099', u'ExclusionAmt': u'15142', u'Desc': u'OTHER INCOME'} | NaN | 75409 |
374 | NaN | NaN | NaN | 2923.0 | NaN | NaN | 0 | 141660 | NaN | NaN | 14151 | NaN |
df[['OtherRevenueTotalAmt', 'OtherRevenueMiscGrp','TotalOtherRevenue', 'CYOtherRevenueAmt']][:10]
OtherRevenueTotalAmt | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|
252 | NaN | NaN | 0 | NaN |
433 | NaN | NaN | 0 | NaN |
78 | NaN | NaN | 0 | NaN |
333 | 428192 | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | NaN | NaN | 451157 | NaN |
481 | NaN | NaN | 458249 | NaN |
16 | NaN | NaN | 58199 | NaN |
130 | NaN | NaN | 91122 | NaN |
434 | 15142 | {u'TotalRevenueColumnAmt': u'15142', u'BusinessCd': u'900099', u'ExclusionAmt': u'15142', u'Desc': u'OTHER INCOME'} | NaN | 75409 |
374 | NaN | NaN | 14151 | NaN |
counter = 1
df['OtherRevMisc'] = np.nan
for index, row in df.iterrows():
#print row['NetIncomeFromFundraisingEvents']
#print type(ast.literal_eval(str(row['NetIncomeFromFundraisingEvents'])))
try:
fr_revs = str(row['OtherRevenueMiscGrp'])
if fr_revs.startswith('['):
#print 'yes'
fr_revs = fr_revs.replace('[', '').replace(']', '')
fr_revs = ast.literal_eval(fr_revs)
fr_revs = fr_revs[0]
else:
fr_revs = ast.literal_eval(fr_revs)
print fr_revs, '\n'
if 'TotalRevenueColumn' in fr_revs:
#print 'yes', fr_revs['TotalRevenueColumn']
df.ix[index, 'OtherRevMisc'] = fr_revs['TotalRevenueColumn']
elif 'TotalRevenueColumnAmt' in fr_revs:
#print 'yes', fr_revs['TotalRevenueColumnAmt']
df.ix[index, 'OtherRevMisc'] = fr_revs['TotalRevenueColumnAmt']
except:
df.ix[index, 'OtherRevMisc'] = np.nan
else:
df.ix[index, 'OtherRevMisc'] = np.nan
#counter +=1
#print counter
{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'} {u'TotalRevenueColumnAmt': u'15142', u'BusinessCd': u'900099', u'ExclusionAmt': u'15142', u'Desc': u'OTHER INCOME'} {u'TotalRevenueColumnAmt': u'155000', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'155000', u'Desc': u'REFUND OF PRIOR-YEAR GRANT'} {u'TotalRevenueColumnAmt': u'4740', u'BusinessCd': u'624100', u'RelatedOrExemptFuncIncomeAmt': u'4740', u'Desc': u'CONFERENCE REGISTRATION'} {u'TotalRevenueColumnAmt': u'2027768', u'BusinessCd': u'900099', u'ExclusionAmt': u'2027768', u'Desc': u'INSURANCE PROCEEDS'} {u'TotalRevenueColumnAmt': u'25330', u'BusinessCd': u'900099', u'ExclusionAmt': u'25330', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'2581', u'BusinessCd': u'900099', u'ExclusionAmt': u'2581', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'1480447', u'RelatedOrExemptFuncIncomeAmt': u'1480447'} {u'TotalRevenueColumnAmt': u'3837835', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'3837835', u'Desc': u'RESTITUTION'} {u'TotalRevenueColumnAmt': u'458878', u'BusinessCd': u'900099', u'ExclusionAmt': u'458878', u'Desc': u'MEALS REVENUE'} {u'TotalRevenueColumnAmt': u'850981', u'BusinessCd': u'900099', u'ExclusionAmt': u'850981', u'Desc': u'REINSURANCE RECOVERY'} {u'TotalRevenueColumnAmt': u'600', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'600', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'11140', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'11140', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'457869', u'RelatedOrExemptFuncIncomeAmt': u'457869', u'Desc': u'PERPETUAL CARE ENDOWMENTS'} {u'TotalRevenueColumnAmt': u'1000000', u'RelatedOrExemptFuncIncomeAmt': u'1000000', u'Desc': u'LEGAL RECOVERY SETTLEMENT'} {u'TotalRevenueColumnAmt': u'7316917', u'BusinessCd': u'561000', u'RelatedOrExemptFuncIncomeAmt': u'7316917', u'Desc': u'AGENCY FEES'} {u'TotalRevenueColumnAmt': u'6937816', u'BusinessCd': u'561000', u'RelatedOrExemptFuncIncomeAmt': u'6937816', u'Desc': u'AGENCY FEES'} {u'TotalRevenueColumnAmt': u'74698', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'74698', u'Desc': u'OTHER INCOME'} {u'TotalRevenueColumnAmt': u'-2193', u'BusinessCd': u'900099', u'ExclusionAmt': u'-2193', u'Desc': u'MISCELLANEOUS INCOME'} {u'TotalRevenueColumnAmt': u'6750977', u'BusinessCd': u'900099', u'ExclusionAmt': u'6750977', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'749391', u'BusinessCd': u'900099', u'ExclusionAmt': u'749391', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'517', u'BusinessCd': u'713990', u'RelatedOrExemptFuncIncomeAmt': u'517', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'121400', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'121400', u'Desc': u'BAD DEBT RECOVERY'} {u'TotalRevenueColumnAmt': u'292707', u'BusinessCd': u'900099', u'ExclusionAmt': u'292707', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'2653040', u'BusinessCd': u'900099', u'ExclusionAmt': u'2653040', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'5297259', u'BusinessCd': u'900099', u'ExclusionAmt': u'5297259', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'9096', u'BusinessCd': u'624200', u'ExclusionAmt': u'9096', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'1143351', u'ExclusionAmt': u'1143351', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'667098', u'ExclusionAmt': u'667098', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'147143', u'BusinessCd': u'900099', u'ExclusionAmt': u'147143', u'Desc': u'REFUNDED GRANTS'} {u'TotalRevenueColumnAmt': u'81840', u'BusinessCd': u'900099', u'ExclusionAmt': u'81840', u'Desc': u'REFUNDED GRANTS'} {u'TotalRevenueColumnAmt': u'15652', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'15652', u'Desc': u'OTHER REVENUE'} {u'TotalRevenueColumnAmt': u'5100', u'BusinessCd': u'900099', u'ExclusionAmt': u'5100', u'Desc': u'MISCELLANEOUS INCOME'} {u'TotalRevenueColumnAmt': u'2993', u'RelatedOrExemptFuncIncomeAmt': u'2993', u'Desc': u'OTHER'} {u'TotalRevenueColumnAmt': u'2407', u'RelatedOrExemptFuncIncomeAmt': u'2407', u'Desc': u'OTHER'} {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000', u'Desc': u'SETTLEMENT'} {u'TotalRevenueColumnAmt': u'1050', u'BusinessCd': u'900099', u'ExclusionAmt': u'1050', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'100', u'BusinessCd': u'900099', u'ExclusionAmt': u'100', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'561', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'561', u'Desc': u'OTHER INCOME'} {u'TotalRevenueColumnAmt': u'629', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'629', u'Desc': u'OTHER INCOME'} {u'TotalRevenueColumnAmt': u'2111004', u'BusinessCd': u'900099', u'ExclusionAmt': u'2111004', u'Desc': u'DEBT FORGIVENESS'} {u'UnrelatedBusinessRevenueAmt': u'336417', u'TotalRevenueColumnAmt': u'336417', u'BusinessCd': u'900004', u'Desc': u'JOB BOARD'} {u'UnrelatedBusinessRevenueAmt': u'247673', u'TotalRevenueColumnAmt': u'247673', u'BusinessCd': u'900004', u'Desc': u'JOB PLACEMENT'} {u'TotalRevenueColumnAmt': u'3990', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'3990', u'Desc': u'CONVENTIONS'} {u'TotalRevenueColumnAmt': u'1057365', u'BusinessCd': u'900099', u'ExclusionAmt': u'1057365', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'27153', u'BusinessCd': u'900099', u'ExclusionAmt': u'27153', u'Desc': u'MISCELLANEOUS REVENUE'} {u'TotalRevenueColumnAmt': u'13039', u'ExclusionAmt': u'13039', u'Desc': u'MISC. REVENUE'} {u'TotalRevenueColumnAmt': u'13039', u'BusinessCd': u'900099', u'ExclusionAmt': u'13039', u'Desc': u'Miscellaneous'} {u'TotalRevenueColumnAmt': u'1262179', u'BusinessCd': u'900099', u'ExclusionAmt': u'1262179', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'4985', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'4985', u'Desc': u'MISCELLANEOUS INCOME'} {u'UnrelatedBusinessRevenueAmt': u'165000', u'TotalRevenueColumnAmt': u'165000', u'BusinessCd': u'541800', u'Desc': u'ADVERTISING'} {u'UnrelatedBusinessRevenueAmt': u'130000', u'TotalRevenueColumnAmt': u'130000', u'BusinessCd': u'541800', u'Desc': u'ADVERTISING'} {u'TotalRevenueColumnAmt': u'165533', u'BusinessCd': u'900099', u'ExclusionAmt': u'165533', u'Desc': u'RECOVERY ON THEFT'} {u'TotalRevenueColumnAmt': u'45246', u'BusinessCd': u'999999', u'RelatedOrExemptFuncIncomeAmt': u'45246', u'Desc': u'MANAGEMENT SERVICE'} {u'TotalRevenueColumnAmt': u'1235', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'1235', u'Desc': u'MISCELLANEOUS INCOME'} {u'TotalRevenueColumnAmt': u'205555', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'205555', u'Desc': u'INCOME FROM FIRE INSURANCE'} {u'TotalRevenueColumnAmt': u'183015', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'183015', u'Desc': u'DEBT WRITE-OFF'} {u'TotalRevenueColumnAmt': u'13241', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'13241', u'Desc': u'RETURN ON FCC INVESTME'} {u'TotalRevenueColumnAmt': u'13481', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'13481', u'Desc': u'MISCELLANEOUS INCOME'} {u'TotalRevenueColumnAmt': u'55430', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'55430', u'Desc': u'MISCELLANEOUS INCOME'} {u'TotalRevenueColumnAmt': u'6436', u'ExclusionAmt': u'6436', u'Desc': u'MISCELLANEOUS'} {u'BusinessCd': u'531190', u'Desc': u'LOSS FROM PARTNERSHIP'} {u'TotalRevenueColumnAmt': u'25911', u'ExclusionAmt': u'25911', u'Desc': u'GAIN ON CSV OF LIFE INSURANCE'} {u'TotalRevenueColumnAmt': u'2737', u'ExclusionAmt': u'2737', u'Desc': u'Other revenue'} {u'TotalRevenueColumnAmt': u'6480', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'6480', u'Desc': u'MISC REFUNDS'} {u'TotalRevenueColumnAmt': u'33', u'ExclusionAmt': u'33', u'Desc': u'Other'} {u'TotalRevenueColumnAmt': u'5171923', u'BusinessCd': u'900099', u'ExclusionAmt': u'5171923', u'Desc': u'MULTIMEDIA CONTRACTUAL INCOME'} {u'UnrelatedBusinessRevenueAmt': u'16379', u'TotalRevenueColumnAmt': u'25900', u'BusinessCd': u'900099', u'ExclusionAmt': u'9521', u'Desc': u'FDOT ENCROACHMENT REMOVAL'} {u'TotalRevenueColumnAmt': u'16893', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'16893', u'Desc': u'SMALL BUSINESS HEALTH'} {u'TotalRevenueColumnAmt': u'11214', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'11214', u'Desc': u'FEDERAL TAX REFUND'} {u'TotalRevenueColumnAmt': u'427458', u'RelatedOrExemptFuncIncomeAmt': u'159837', u'BusinessCd': u'900199', u'ExclusionAmt': u'267621', u'Desc': u'OTHER REVENUE'} {u'TotalRevenueColumnAmt': u'301470', u'BusinessCd': u'621110', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'738003', u'BusinessCd': u'621112', u'RelatedOrExemptFuncIncomeAmt': u'738003', u'Desc': u'Miscellaneous Revenue'} {u'TotalRevenueColumnAmt': u'130433', u'BusinessCd': u'900099', u'ExclusionAmt': u'130433', u'Desc': u'FLOOD INSURANCE PAYOUT'} {u'TotalRevenueColumnAmt': u'772420', u'BusinessCd': u'900099', u'ExclusionAmt': u'772420', u'Desc': u'BABC REGISTRATION FEES'} {u'TotalRevenueColumnAmt': u'1284814', u'BusinessCd': u'900099', u'ExclusionAmt': u'1284814', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'1271392', u'BusinessCd': u'900099', u'ExclusionAmt': u'1271392', u'Desc': u'DEBT FORGIVENESS'} {u'TotalRevenueColumnAmt': u'87550', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'87550', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'93778', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'93778', u'Desc': u'INSURANCE PROCEEDS'} {u'TotalRevenueColumnAmt': u'378405', u'BusinessCd': u'541800', u'RelatedOrExemptFuncIncomeAmt': u'378405', u'Desc': u'ADMINISTRATIVE INCOME'} {u'TotalRevenueColumnAmt': u'134786', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'134786', u'Desc': u'INSURANCE PROCEEDS FRO'} {u'TotalRevenueColumnAmt': u'105205', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'105205', u'Desc': u'MISCELLANEOUS'} {u'UnrelatedBusinessRevenueAmt': u'7929062', u'TotalRevenueColumnAmt': u'7929062', u'BusinessCd': u'812300', u'Desc': u'LAUNDRY SERVICES'} {u'TotalRevenueColumnAmt': u'5706', u'RelatedOrExemptFuncIncomeAmt': u'5706', u'Desc': u'OTHER RECEIPTS'} {u'TotalRevenueColumnAmt': u'2380', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'2380', u'Desc': u'GAIN/LOSS ON CURRENCY'} {u'TotalRevenueColumnAmt': u'-1340', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'-1340', u'Desc': u'GAIN/LOSS ON CURRENCY'} {u'TotalRevenueColumnAmt': u'1613', u'RelatedOrExemptFuncIncomeAmt': u'1613', u'Desc': u'DAF Mgmt Fees'} {u'TotalRevenueColumnAmt': u'11566', u'RelatedOrExemptFuncIncomeAmt': u'11566', u'Desc': u'OTHER REVENUE'} {u'TotalRevenueColumnAmt': u'37665', u'RelatedOrExemptFuncIncomeAmt': u'37665', u'Desc': u'OTHER REVENUE'} {u'TotalRevenueColumnAmt': u'5005', u'ExclusionAmt': u'5005', u'Desc': u'MISC. REFUNDS'} {u'TotalRevenueColumnAmt': u'66924', u'BusinessCd': u'515100', u'ExclusionAmt': u'66924', u'Desc': u'MISCELLANEOUS'} {u'TotalRevenueColumnAmt': u'1548749', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'1548749', u'Desc': u'VALUE ADDED TAX REBATE'} {u'TotalRevenueColumnAmt': u'137059', u'BusinessCd': u'900099', u'ExclusionAmt': u'137059', u'Desc': u'OTHER'} {u'TotalRevenueColumnAmt': u'7299', u'ExclusionAmt': u'7299', u'Desc': u'Gain on A/C ins. claim'} {u'TotalRevenueColumnAmt': u'50340', u'BusinessCd': u'900099', u'ExclusionAmt': u'50340', u'Desc': u'List Rental'} {u'TotalRevenueColumnAmt': u'19820777', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'19820777', u'Desc': u'OTHER INCOME- FROM AFFILIATES'} {u'TotalRevenueColumnAmt': u'15369743', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'15369743', u'Desc': u'OTHER INCOME- FROM AFFILIATES'}
df[['OtherRevenueTotalAmt', 'OtherRevMisc', 'OtherRevenueMiscGrp','TotalOtherRevenue', 'CYOtherRevenueAmt']][:10]
OtherRevenueTotalAmt | OtherRevMisc | OtherRevenueMiscGrp | TotalOtherRevenue | CYOtherRevenueAmt | |
---|---|---|---|---|---|
252 | NaN | NaN | NaN | 0 | NaN |
433 | NaN | NaN | NaN | 0 | NaN |
78 | NaN | NaN | NaN | 0 | NaN |
333 | 428192 | NaN | [{u'TotalRevenueColumnAmt': u'33550', u'BusinessCd': u'531390', u'RelatedOrExemptFuncIncomeAmt': u'33550', u'Desc': u'PARKING INCOME'}, {u'TotalRevenueColumnAmt': u'394642', u'BusinessCd': u'900099', u'RelatedOrExemptFuncIncomeAmt': u'394642', u'... | NaN | 1017789 |
12 | NaN | NaN | NaN | 451157 | NaN |
481 | NaN | NaN | NaN | 458249 | NaN |
16 | NaN | NaN | NaN | 58199 | NaN |
130 | NaN | NaN | NaN | 91122 | NaN |
434 | 15142 | NaN | {u'TotalRevenueColumnAmt': u'15142', u'BusinessCd': u'900099', u'ExclusionAmt': u'15142', u'Desc': u'OTHER INCOME'} | NaN | 75409 |
374 | NaN | NaN | NaN | 14151 | NaN |
print len(df[df['OtherRevenueTotalAmt'].notnull()])
print len(df[df['InvestmentIncomeCurrentYear'].notnull()])
df['OtherRevenueTotalAmt'] = np.where( ((df['OtherRevenueTotalAmt'].isnull()) & (df['CYOtherRevenueAmt'].notnull())),
df['CYOtherRevenueAmt'], df['OtherRevenueTotalAmt'] )
print len(df[df['OtherRevenueTotalAmt'].notnull()])
df['OtherRevenueTotalAmt'] = np.where( ((df['OtherRevenueTotalAmt'].isnull()) & (df['TotalOtherRevenue'].notnull())),
df['TotalOtherRevenue'], df['OtherRevenueTotalAmt'] )
print len(df[df['OtherRevenueTotalAmt'].notnull()])
df['OtherRevenueTotalAmt'] = np.where( ((df['OtherRevenueTotalAmt'].isnull()) & (df['OtherRevMisc'].notnull())),
df['OtherRevMisc'], df['OtherRevenueTotalAmt'] )
print len(df[df['OtherRevenueTotalAmt'].notnull()])
df['OtherRevenueTotalAmt'].describe().T
120 344 194 414 414
count 414 unique 294 top 0 freq 119 Name: OtherRevenueTotalAmt, dtype: object
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', #'FederatedCampaignsAmt',
'MembershipDues', #'MembershipDuesAmt',
#'NetIncomeFromFundraisingEvents', 'NetIncmFromFundraisingEvtGrp',
#'FundraisingActivities', 'FundraisingActivitiesInd',
#'FundraisingAmt',
#'FundraisingDirectExpenses', 'FundraisingDirectExpensesAmt', 'FundraisingEvents',
#'FundraisingGrossIncomeAmt',
'fundraising',
'RelatedOrganizations', #'RelatedOrganizationsAmt',
'GovernmentGrants', #'GovernmentGrantsAmt',
#'PYProgramServiceRevenueAmt', 'CYProgramServiceRevenueAmt',
#'ProgramServiceRevenue', 'ProgramServiceRevenueCY', 'ProgramServiceRevenueGrp',
#'TotalProgramServiceRevenue', 'TotalProgramServiceRevenueAmt',
'program_revenue',
'CYInvestmentIncomeAmt',
#'InvestmentIncome', 'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp',
'OtherRevenueTotalAmt', #'OtherRevenueMiscGrp',
#'TotalOtherRevenue', 'CYOtherRevenueAmt'
]
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
df[complexity_columns][:10]
CYContributionsGrantsAmt | FederatedCampaigns | MembershipDues | fundraising | RelatedOrganizations | GovernmentGrants | program_revenue | CYInvestmentIncomeAmt | OtherRevenueTotalAmt | |
---|---|---|---|---|---|---|---|---|---|
252 | NaN | NaN | NaN | 0.0 | NaN | 185408 | 0 | 63 | 0 |
433 | NaN | NaN | NaN | 0.0 | NaN | NaN | 0 | 0 | 0 |
78 | NaN | NaN | NaN | 0.0 | NaN | NaN | 0 | 0 | 0 |
333 | 20437991 | NaN | NaN | 0.0 | 2053228 | 16124016 | 0 | 1539746 | 428192 |
12 | NaN | NaN | NaN | 0.0 | 2466016 | 16660756 | 0 | 1195606 | 451157 |
481 | NaN | NaN | NaN | 221449.0 | 954610 | 21380666 | 0 | 1249229 | 458249 |
16 | NaN | NaN | NaN | 23265.0 | NaN | 19362978 | 0 | 703373 | 58199 |
130 | NaN | 0 | 0 | 1322.0 | 0 | 20439097 | 0 | 1195892 | 91122 |
434 | 556113 | NaN | NaN | 9429.0 | NaN | NaN | 0 | 132915 | 15142 |
374 | NaN | NaN | NaN | 2923.0 | NaN | NaN | 0 | 141660 | 14151 |
print complexity_columns
['CYContributionsGrantsAmt', 'FederatedCampaigns', 'MembershipDues', 'fundraising', 'RelatedOrganizations', 'GovernmentGrants', 'program_revenue', 'CYInvestmentIncomeAmt', 'OtherRevenueTotalAmt']
df['CYContributionsGrantsAmt'] = df['CYContributionsGrantsAmt'].astype('float')
df['FederatedCampaigns'] = df['FederatedCampaigns'].astype('float')
df['MembershipDues'] = df['MembershipDues'].astype('float')
df['fundraising'] = df['fundraising'].astype('float')
df['RelatedOrganizations'] = df['RelatedOrganizations'].astype('float')
df['GovernmentGrants'] = df['GovernmentGrants'].astype('float')
df['program_revenue'] = df['program_revenue'].astype('float')
df['CYInvestmentIncomeAmt'] = df['CYInvestmentIncomeAmt'].astype('float')
df['OtherRevenueTotalAmt'] = df['OtherRevenueTotalAmt'].astype('float')
df[complexity_columns].dtypes
CYContributionsGrantsAmt float64 FederatedCampaigns float64 MembershipDues float64 fundraising float64 RelatedOrganizations float64 GovernmentGrants float64 program_revenue float64 CYInvestmentIncomeAmt float64 OtherRevenueTotalAmt float64 dtype: object
df[complexity_columns].describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
CYContributionsGrantsAmt | 194.0 | 1.462879e+07 | 9.210089e+07 | 0.0 | NaN | NaN | NaN | 9.182532e+08 |
FederatedCampaigns | 40.0 | 8.356717e+05 | 1.710843e+06 | 0.0 | NaN | NaN | NaN | 6.051068e+06 |
MembershipDues | 44.0 | 9.858459e+04 | 1.330675e+05 | 0.0 | NaN | NaN | NaN | 6.000830e+05 |
fundraising | 228.0 | 8.138974e+04 | 5.064406e+05 | -283900.0 | NaN | NaN | NaN | 7.283318e+06 |
RelatedOrganizations | 52.0 | 3.521487e+06 | 5.394679e+06 | 0.0 | NaN | NaN | NaN | 2.086632e+07 |
GovernmentGrants | 213.0 | 1.975853e+07 | 9.133767e+07 | 0.0 | NaN | NaN | NaN | 7.549167e+08 |
program_revenue | 538.0 | 4.042349e+07 | 2.827323e+08 | 0.0 | 0.0 | 172176.0 | 3712362.0 | 3.090399e+09 |
CYInvestmentIncomeAmt | 538.0 | 9.094292e+05 | 1.338435e+07 | -218265025.0 | 0.0 | 1222.0 | 36411.5 | 1.583775e+08 |
OtherRevenueTotalAmt | 414.0 | 2.131831e+06 | 1.163631e+07 | -162277.0 | NaN | NaN | NaN | 1.463759e+08 |
#df = df.drop('other_revenue', 1)
print len(df[df['CYInvestmentIncomeAmt'].notnull()])
print len(df[df['OtherRevenueTotalAmt'].notnull()])
df['other_revenue_binary'] = np.nan
df['other_revenue_binary'] = np.where(df['CYInvestmentIncomeAmt'].notnull(), df['CYInvestmentIncomeAmt'],
df['OtherRevenueTotalAmt'])
print len(df[df['other_revenue_binary'].notnull()])
538 414 538
print complexity_columns
['CYContributionsGrantsAmt', 'FederatedCampaigns', 'MembershipDues', 'fundraising', 'RelatedOrganizations', 'GovernmentGrants', 'program_revenue', 'CYInvestmentIncomeAmt', 'OtherRevenueTotalAmt']
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', 'MembershipDues', 'fundraising',
'RelatedOrganizations', 'GovernmentGrants', 'program_revenue', 'other_revenue_binary',
'CYInvestmentIncomeAmt', 'OtherRevenueTotalAmt']
df[complexity_columns][10:30]
CYContributionsGrantsAmt | FederatedCampaigns | MembershipDues | fundraising | RelatedOrganizations | GovernmentGrants | program_revenue | other_revenue_binary | CYInvestmentIncomeAmt | OtherRevenueTotalAmt | |
---|---|---|---|---|---|---|---|---|---|---|
118 | NaN | NaN | NaN | 47210.0 | NaN | NaN | 0.0 | 182952.0 | 182952.0 | 124761.0 |
124 | NaN | NaN | NaN | 99073.0 | NaN | NaN | 0.0 | 92923.0 | 92923.0 | 9015.0 |
263 | 994088.0 | NaN | NaN | NaN | NaN | NaN | 76449.0 | -141807.0 | -141807.0 | 0.0 |
196 | NaN | NaN | NaN | NaN | NaN | NaN | 76449.0 | 0.0 | 0.0 | NaN |
256 | NaN | NaN | NaN | NaN | NaN | NaN | 205125.0 | 0.0 | 0.0 | NaN |
233 | NaN | NaN | NaN | NaN | NaN | NaN | 256109.0 | -52721.0 | -52721.0 | NaN |
129 | NaN | NaN | NaN | NaN | NaN | NaN | 107536.0 | 0.0 | 0.0 | NaN |
319 | 104931.0 | NaN | NaN | 0.0 | NaN | NaN | 224006.0 | 23.0 | 23.0 | 0.0 |
394 | 86418.0 | NaN | NaN | 0.0 | NaN | NaN | 206873.0 | 30.0 | 30.0 | 0.0 |
462 | NaN | NaN | NaN | 0.0 | NaN | NaN | 206873.0 | 4.0 | 4.0 | 0.0 |
101 | NaN | NaN | NaN | 0.0 | NaN | NaN | 194006.0 | 7.0 | 7.0 | 0.0 |
528 | 12294982.0 | NaN | NaN | NaN | NaN | 646173.0 | 0.0 | 1549.0 | 1549.0 | 0.0 |
401 | 12741996.0 | NaN | NaN | NaN | NaN | 629939.0 | 0.0 | 1656.0 | 1656.0 | 0.0 |
361 | NaN | NaN | NaN | NaN | NaN | 437339.0 | 0.0 | 2695.0 | 2695.0 | NaN |
36 | NaN | NaN | NaN | NaN | NaN | 461226.0 | 0.0 | 2317.0 | 2317.0 | NaN |
41 | NaN | NaN | NaN | NaN | NaN | 373496.0 | 0.0 | 2984.0 | 2984.0 | NaN |
63 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | NaN |
93 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 642.0 | 642.0 | NaN |
504 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 616.0 | 616.0 | NaN |
31 | 307743.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | 5902.0 | 5902.0 | 155000.0 |
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
df[complexity_columns].describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
CYContributionsGrantsAmt | 194.0 | 1.462879e+07 | 9.210089e+07 | 0.0 | NaN | NaN | NaN | 9.182532e+08 |
FederatedCampaigns | 40.0 | 8.356717e+05 | 1.710843e+06 | 0.0 | NaN | NaN | NaN | 6.051068e+06 |
MembershipDues | 44.0 | 9.858459e+04 | 1.330675e+05 | 0.0 | NaN | NaN | NaN | 6.000830e+05 |
fundraising | 228.0 | 8.138974e+04 | 5.064406e+05 | -283900.0 | NaN | NaN | NaN | 7.283318e+06 |
RelatedOrganizations | 52.0 | 3.521487e+06 | 5.394679e+06 | 0.0 | NaN | NaN | NaN | 2.086632e+07 |
GovernmentGrants | 213.0 | 1.975853e+07 | 9.133767e+07 | 0.0 | NaN | NaN | NaN | 7.549167e+08 |
program_revenue | 538.0 | 4.042349e+07 | 2.827323e+08 | 0.0 | 0.0 | 172176.0 | 3712362.0 | 3.090399e+09 |
other_revenue_binary | 538.0 | 9.094292e+05 | 1.338435e+07 | -218265025.0 | 0.0 | 1222.0 | 36411.5 | 1.583775e+08 |
CYInvestmentIncomeAmt | 538.0 | 9.094292e+05 | 1.338435e+07 | -218265025.0 | 0.0 | 1222.0 | 36411.5 | 1.583775e+08 |
OtherRevenueTotalAmt | 414.0 | 2.131831e+06 | 1.163631e+07 | -162277.0 | NaN | NaN | NaN | 1.463759e+08 |
print complexity_columns
['CYContributionsGrantsAmt', 'FederatedCampaigns', 'MembershipDues', 'fundraising', 'RelatedOrganizations', 'GovernmentGrants', 'program_revenue', 'other_revenue_binary', 'CYInvestmentIncomeAmt', 'OtherRevenueTotalAmt']
complexity_columns = ['CYContributionsGrantsAmt', 'FederatedCampaigns', 'MembershipDues', 'fundraising',
'RelatedOrganizations', 'GovernmentGrants', 'program_revenue', 'other_revenue_binary']
df['complexity'] = (df[complexity_columns] > 0).sum(1)
df['complexity'][:3]
252 2 433 0 78 0 Name: complexity, dtype: int64
complexity_columns = ['complexity'] + complexity_columns
df['complexity'].describe()
count 538.000000 mean 2.431227 std 1.278642 min 0.000000 25% 2.000000 50% 2.000000 75% 3.000000 max 7.000000 Name: complexity, dtype: float64
df[complexity_columns].describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
complexity | 538.0 | 2.431227e+00 | 1.278642e+00 | 0.0 | 2.0 | 2.0 | 3.0 | 7.000000e+00 |
CYContributionsGrantsAmt | 194.0 | 1.462879e+07 | 9.210089e+07 | 0.0 | NaN | NaN | NaN | 9.182532e+08 |
FederatedCampaigns | 40.0 | 8.356717e+05 | 1.710843e+06 | 0.0 | NaN | NaN | NaN | 6.051068e+06 |
MembershipDues | 44.0 | 9.858459e+04 | 1.330675e+05 | 0.0 | NaN | NaN | NaN | 6.000830e+05 |
fundraising | 228.0 | 8.138974e+04 | 5.064406e+05 | -283900.0 | NaN | NaN | NaN | 7.283318e+06 |
RelatedOrganizations | 52.0 | 3.521487e+06 | 5.394679e+06 | 0.0 | NaN | NaN | NaN | 2.086632e+07 |
GovernmentGrants | 213.0 | 1.975853e+07 | 9.133767e+07 | 0.0 | NaN | NaN | NaN | 7.549167e+08 |
program_revenue | 538.0 | 4.042349e+07 | 2.827323e+08 | 0.0 | 0.0 | 172176.0 | 3712362.0 | 3.090399e+09 |
other_revenue_binary | 538.0 | 9.094292e+05 | 1.338435e+07 | -218265025.0 | 0.0 | 1222.0 | 36411.5 | 1.583775e+08 |
#[contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016',
# 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016',
# 'program_service_revenue_2016', 'other_revenue_2016']
SOX_columns = ['conflict_of_interest_policy', 'whistleblower_policy', 'records_retention_policy']
#df['SOX_policies'] = (df[SOX_columns] > 0).sum(1)
df['SOX_policies'] = np.where(df['conflict_of_interest_policy'].notnull(),
(df[SOX_columns] > 0).sum(1), np.nan)
df['SOX_policies'][:3]
252 2.0 433 2.0 78 2.0 Name: SOX_policies, dtype: float64
SOX_columns = ['SOX_policies'] + SOX_columns
df[SOX_columns][:8]
SOX_policies | conflict_of_interest_policy | whistleblower_policy | records_retention_policy | |
---|---|---|---|---|
252 | 2.0 | 1 | 0 | 1 |
433 | 2.0 | 1 | 0 | 1 |
78 | 2.0 | 1 | 0 | 1 |
333 | 3.0 | 1 | 1 | 1 |
12 | 3.0 | 1 | 1 | 1 |
481 | 3.0 | 1 | 1 | 1 |
16 | 3.0 | 1 | 1 | 1 |
130 | 3.0 | 1 | 1 | 1 |
Binary version ('1' equals any SOX policy).
df['SOX_policies_binary'] = df['SOX_policies']
df['SOX_policies_binary'] = np.where(df['SOX_policies_binary']>=1, 1, df['SOX_policies'])
print df['SOX_policies_binary'].value_counts().sum()
df['SOX_policies_binary'].value_counts()
538
1.0 463 0.0 75 Name: SOX_policies_binary, dtype: int64
Create binary version for all three policies
df['SOX_policies_all_binary'] = np.nan
df['SOX_policies_all_binary'] = df['SOX_policies']
df['SOX_policies_all_binary'] = np.where( ((df['SOX_policies_all_binary']==1) | (df['SOX_policies_all_binary']==2)),
0, df['SOX_policies_all_binary'])
df['SOX_policies_all_binary'] = np.where(df['SOX_policies_all_binary']==3, 1, df['SOX_policies_all_binary'])
print df['SOX_policies_all_binary'].value_counts()
1.0 293 0.0 245 Name: SOX_policies_all_binary, dtype: int64
print df['SOX_policies'].value_counts(), '\n'
print df['SOX_policies_binary'].value_counts(), '\n'
print df['SOX_policies_all_binary'].value_counts()
3.0 293 2.0 105 0.0 75 1.0 65 Name: SOX_policies, dtype: int64 1.0 463 0.0 75 Name: SOX_policies_binary, dtype: int64 1.0 293 0.0 245 Name: SOX_policies_all_binary, dtype: int64
print len(df.columns)
print df.columns.tolist()
54 ['FYE', u'EIN', u'OrganizationName', u'URL', u'SubmittedOn', u'TaxPeriod', 'tot_rev', 'whistleblower_policy', 'conflict_of_interest_policy', 'records_retention_policy', u'CYTotalExpensesAmt', u'TotalExpensesCurrentYear', u'TotalFunctionalExpenses', u'TotalFunctionalExpensesGrp', u'TotalProgramServiceExpense', u'TotalProgramServiceExpensesAmt', u'CYContributionsGrantsAmt', u'FederatedCampaigns', u'FederatedCampaignsAmt', u'MembershipDues', u'MembershipDuesAmt', u'NetIncomeFromFundraisingEvents', u'NetIncmFromFundraisingEvtGrp', u'FundraisingAmt', u'FundraisingGrossIncomeAmt', u'RelatedOrganizations', u'RelatedOrganizationsAmt', u'GovernmentGrants', u'GovernmentGrantsAmt', u'PYProgramServiceRevenueAmt', u'CYProgramServiceRevenueAmt', u'ProgramServiceRevenue', u'ProgramServiceRevenueCY', u'ProgramServiceRevenueGrp', u'TotalProgramServiceRevenue', u'TotalProgramServiceRevenueAmt', u'CYInvestmentIncomeAmt', u'InvestmentIncome', u'InvestmentIncomeCurrentYear', u'InvestmentIncomeGrp', u'OtherRevenueTotalAmt', u'OtherRevenueMiscGrp', u'TotalOtherRevenue', u'CYOtherRevenueAmt', 'total_expenses', 'program_expenses', 'fundraising', 'program_revenue', 'OtherRevMisc', 'other_revenue_binary', 'complexity', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']
df.to_pickle('538 e-file 990s for 2016 donor advisory organizations, v3 (all calc columns).pkl')
file_list_columns = ['EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod', 'FYE']
SOX_columns = ['whistleblower_policy', 'conflict_of_interest_policy', 'records_retention_policy',
'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']
control_columns = ['tot_rev', 'total_expenses', 'program_expenses', 'program_efficiency', 'complexity']
unneeded_columns = ['CYTotalExpensesAmt', u'TotalExpensesCurrentYear', u'TotalFunctionalExpenses',
'TotalFunctionalExpensesGrp', u'TotalProgramServiceExpense', u'TotalProgramServiceExpensesAmt',
'CYContributionsGrantsAmt', u'FederatedCampaigns', u'FederatedCampaignsAmt', u'MembershipDues',
'MembershipDuesAmt', u'NetIncomeFromFundraisingEvents', u'NetIncmFromFundraisingEvtGrp',
'FundraisingAmt', 'FundraisingGrossIncomeAmt', u'RelatedOrganizations',
'RelatedOrganizationsAmt', u'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', u'CYProgramServiceRevenueAmt', 'ProgramServiceRevenue',
'ProgramServiceRevenueCY', u'ProgramServiceRevenueGrp', u'TotalProgramServiceRevenue',
'TotalProgramServiceRevenueAmt', u'CYInvestmentIncomeAmt', u'InvestmentIncome',
'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp', u'OtherRevenueTotalAmt',
'OtherRevenueMiscGrp', u'TotalOtherRevenue', 'CYOtherRevenueAmt', 'fundraising',
'program_revenue', 'OtherRevMisc', 'other_revenue_binary']
cols = file_list_columns+SOX_columns+control_columns
df = df[cols]
df[cols][:5]
EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | FYE | whistleblower_policy | conflict_of_interest_policy | records_retention_policy | SOX_policies | SOX_policies_binary | SOX_policies_all_binary | tot_rev | total_expenses | program_expenses | program_efficiency | complexity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 201212 | FY2012 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 2515399 | 2100922.0 | 1598626.0 | 0.760916 | 2 |
433 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 201112 | FY2011 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 2161209 | 2481310.0 | 1715793.0 | 0.691487 | 0 |
78 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201220909349300327_public.xml | 2012-05-25 | 201012 | FY2010 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 3990564 | 3912643.0 | 2930785.0 | 0.749055 | 0 |
333 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201502299349301700_public.xml | 2015-09-21 | 201409 | FY2014 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 22995526 | 25626262.0 | 18485740.0 | 0.721359 | 4 |
12 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201442269349304504_public.xml | 2014-10-28 | 201309 | FY2013 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 25923449 | 26224173.0 | 19280493.0 | 0.735218 | 3 |
14 orgs are missing program_expenses. Dan and I agreed to count these as '0' (see email 9/5/16)
cols2 = ['EIN', 'FYE', 'total_expenses', 'program_expenses', 'program_efficiency']
df[df['program_efficiency'].isnull()][cols2]
EIN | FYE | total_expenses | program_expenses | program_efficiency | |
---|---|---|---|---|---|
172 | 132644507 | FY2014 | 7763846.0 | NaN | NaN |
404 | 132644507 | FY2013 | 7795021.0 | NaN | NaN |
431 | 132644507 | FY2012 | 6237509.0 | NaN | NaN |
106 | 132644507 | FY2011 | 5880789.0 | NaN | NaN |
72 | 132644507 | FY2010 | 6024086.0 | NaN | NaN |
52 | 135579302 | FY2014 | 2023750.0 | NaN | NaN |
406 | 135579302 | FY2013 | 4133761.0 | NaN | NaN |
500 | 201737542 | FY2010 | 177901.0 | NaN | NaN |
0 | 271317072 | FY2014 | 34400.0 | NaN | NaN |
277 | 330837780 | FY2014 | 3330.0 | NaN | NaN |
226 | 591951577 | FY2011 | 1690623.0 | NaN | NaN |
213 | 591951577 | FY2010 | 1161939.0 | NaN | NaN |
495 | 591951577 | FY2010 | 1282149.0 | NaN | NaN |
496 | 953553530 | FY2011 | 0.0 | NaN | NaN |
#df2 = pd.DataFrame(list(filings.find()))
#print '# of columns:', len(df2.columns)
#print '# of observations:', len(df2)
#df2.head(1)
# of columns: 671 # of observations: 538
@documentId | @referenceDocumentId | @referenceDocumentName | @softwareId | @softwareVersion | @softwareVersionNum | AccountantCompileOrReview | AccountantCompileOrReviewBasis | AccountantCompileOrReviewInd | AccountsPayableAccrExpnssGrp | AccountsPayableAccruedExpenses | AccountsReceivable | AccountsReceivableGrp | AcctCompileOrReviewBasisGrp | ActivitiesConductedPartnership | ActivitiesConductedPrtshpInd | Activity2 | Activity3 | ActivityOrMissionDesc | ActivityOrMissionDescription | ActivityOther | AddressChange | AddressChangeInd | AddressPrincipalOfficerUS | Advertising | AdvertisingGrp | AllAffiliatesIncluded | AllAffiliatesIncludedInd | AllOtherContributions | AllOtherContributionsAmt | AllOtherExpenses | AllOtherExpensesGrp | AmendedReturn | AmendedReturnInd | AnnualDisclosureCoveredPersons | AnnualDisclosureCoveredPrsnInd | AuditCommittee | AuditCommitteeInd | AuditedFinancialStmtAttInd | BackupWthldComplianceInd | BalanceSheetAmountsReported | BenefitsPaidToMembersCY | BenefitsPaidToMembersPriorYear | BenefitsToMembers | BenefitsToMembersGrp | BooksInCareOfDetail | BsnssRltnshpThruFamilyMember | BsnssRltnshpWithOrganization | BusinessRlnWithFamMemInd | BusinessRlnWithOfficerEntInd | BusinessRlnWithOrgMemInd | CYBenefitsPaidToMembersAmt | CYContributionsGrantsAmt | CYGrantsAndSimilarPaidAmt | CYInvestmentIncomeAmt | CYOtherExpensesAmt | CYOtherRevenueAmt | CYProgramServiceRevenueAmt | CYRevenuesLessExpensesAmt | CYSalariesCompEmpBnftPaidAmt | CYTotalExpensesAmt | CYTotalFundraisingExpenseAmt | CYTotalProfFndrsngExpnsAmt | CYTotalRevenueAmt | CapStckTrstPrinCurrentFunds | CapStkTrPrinCurrentFundsGrp | CashNonInterestBearing | CashNonInterestBearingGrp | ChangeToOrgDocumentsInd | ChangesToOrganizingDocs | CntrbtnsRprtdFundraisingEvents | CntrctRcvdGreaterThan100KCnt | CollectionsOfArt | CollectionsOfArtInd | CompCurrentOfcrDirectorsGrp | CompCurrentOfficersDirectors | CompDisqualPersons | CompDisqualPersonsGrp | CompensationFromOtherSources | CompensationFromOtherSrcsInd | CompensationProcessCEO | CompensationProcessCEOInd | CompensationProcessOther | CompensationProcessOtherInd | ComplianceWithBackupWitholding | ConferencesMeetings | ConferencesMeetingsGrp | ConflictOfInterestPolicy | ConflictOfInterestPolicyInd | ConservationEasements | ConservationEasementsInd | ConsolidatedAuditFinancialStmt | ConsolidatedAuditFinclStmtInd | ContractorCompensation | ContractorCompensationGrp | ContriRptFundraisingEventAmt | ContributionsGrantsCurrentYear | ContributionsGrantsPriorYear | CostOfGoodsSold | CostOfGoodsSoldAmt | CreditCounseling | CreditCounselingInd | DAFExcessBusinessHoldingsInd | DLN | DecisionsSubjectToApprovaInd | DecisionsSubjectToApproval | DeductibleArtContributionInd | DeductibleContributionsOfArt | DeductibleNonCashContriInd | DeductibleNonCashContributions | DeferredRevenue | DeferredRevenueGrp | DelegationOfManagementDuties | DelegationOfMgmtDutiesInd | DepreciationDepletion | DepreciationDepletionGrp | Desc | DescribedIn501C3 | DescribedInSection501c3Ind | Description | DisregardedEntity | DisregardedEntityInd | DistributionToDonor | DistributionToDonorInd | DoNotFollowSFAS117 | DocumentRetentionPolicy | DocumentRetentionPolicyInd | DoingBusinessAs | DoingBusinessAsName | DonatedServicesAndUseFcltsAmt | DonorAdvisedFundInd | DonorAdvisedFunds | EIN | ElectionOfBoardMembers | ElectionOfBoardMembersInd | EmployeeCnt | EmploymentTaxReturnsFiled | EmploymentTaxReturnsFiledInd | EngagedInExcessBenefitTransInd | EscrowAccount | EscrowAccountInd | EscrowAccountLiability | EscrowAccountLiabilityGrp | ExcessBenefitTransaction | ExcessBusinessHoldings | ExcessBusinessHoldingsInd | Expense | ExpenseAmt | FSAudited | FSAuditedBasis | FSAuditedBasisGrp | FSAuditedInd | FamilyOrBusinessRelationship | FamilyOrBusinessRlnInd | FederalGrantAuditPerformed | FederalGrantAuditPerformedInd | FederalGrantAuditRequired | FederalGrantAuditRequiredInd | FederatedCampaigns | FederatedCampaignsAmt | FeesForServicesAccounting | FeesForServicesAccountingGrp | FeesForServicesInvstMgmntFees | FeesForServicesLegal | FeesForServicesLegalGrp | FeesForServicesLobbying | FeesForServicesLobbyingGrp | FeesForServicesManagement | FeesForServicesManagementGrp | FeesForServicesOther | FeesForServicesOtherGrp | FeesForServicesProfFundraising | FeesForSrvcInvstMgmntFeesGrp | FiledLieu1041 | FinancialStatementBoth | FinancialStatementConsolidated | FinancialStatementSeparate | FinancialStmtAttached | FollowSFAS117 | ForeignActivities | ForeignActivitiesInd | ForeignAddress | ForeignCountryCd | ForeignFinancialAccount | ForeignFinancialAccountInd | ForeignGrants | ForeignGrantsGrp | ForeignOffice | ForeignOfficeInd | Form1098CFiled | Form1098CFiledInd | Form720Filed | Form8282FiledCnt | Form8282PropertyDisposedOf | Form8282PropertyDisposedOfInd | Form8886TFiled | Form8886TFiledInd | Form8899Filed | Form8899Filedind | Form990-TFiled | Form990PartVIISectionA | Form990PartVIISectionAGrp | Form990ProvidedToGoverningBody | Form990ProvidedToGvrnBodyInd | Form990TFiledInd | FormType | FormationYr | FormerOfcrEmployeesListedInd | FormersListed | FundraisingActivities | FundraisingActivitiesInd | FundraisingAmt | FundraisingDirectExpenses | FundraisingDirectExpensesAmt | FundraisingEvents | FundraisingGrossIncomeAmt | FundsToPayPremiums | GainOrLoss | GainOrLossGrp | Gaming | GamingActivitiesInd | GoverningBodyVotingMembersCnt | GovernmentGrants | GovernmentGrantsAmt | GrantAmt | GrantToRelatedPerson | GrantToRelatedPersonInd | Grants | GrantsAndSimilarAmntsCY | GrantsAndSimilarAmntsPriorYear | GrantsPayable | GrantsPayableGrp | GrantsToDomesticIndividuals | GrantsToDomesticIndividualsGrp | GrantsToDomesticOrgs | GrantsToDomesticOrgsGrp | GrantsToIndividuals | GrantsToIndividualsInd | GrantsToOrganizations | GrantsToOrganizationsInd | GrossAmountSalesAssets | GrossAmountSalesAssetsGrp | GrossIncomeFundraisingEvents | GrossReceipts | GrossReceiptsAmt | GrossRents | GrossRentsGrp | GrossSalesOfInventory | GrossSalesOfInventoryAmt | GroupExemptionNum | GroupExemptionNumber | GroupReturnForAffiliates | GroupReturnForAffiliatesInd | Hospital | IRPDocumentCnt | IRPDocumentW2GCnt | IncludeFIN48FootnoteInd | IncmFromInvestBondProceedsGrp | IncomeFromInvestBondProceeds | IndependentAuditFinancialStmt | IndependentAuditFinclStmtInd | IndependentVotingMemberCnt | IndivRcvdGreaterThan100KCnt | IndoorTanningServices | IndoorTanningServicesInd | InfoInScheduleOPartIII | InfoInScheduleOPartIIIInd | InfoInScheduleOPartIX | InfoInScheduleOPartIXInd | InfoInScheduleOPartV | InfoInScheduleOPartVI | InfoInScheduleOPartVII | InfoInScheduleOPartVIII | InfoInScheduleOPartVIIIInd | InfoInScheduleOPartVIIInd | InfoInScheduleOPartVIInd | InfoInScheduleOPartVInd | InfoInScheduleOPartX | InfoInScheduleOPartXI | InfoInScheduleOPartXII | InfoInScheduleOPartXIIInd | InfoInScheduleOPartXIInd | InfoInScheduleOPartXInd | InformationTechnology | InformationTechnologyGrp | InitialReturn | Insurance | InsuranceGrp | IntangibleAssets | IntangibleAssetsGrp | Interest | InterestGrp | InventoriesForSaleOrUse | InventoriesForSaleOrUseGrp | InvestTaxExemptBonds | InvestTaxExemptBondsInd | InvestmentExpenseAmt | InvestmentInJointVenture | InvestmentInJointVentureInd | InvestmentIncome | InvestmentIncomeCurrentYear | InvestmentIncomeGrp | InvestmentIncomePriorYear | InvestmentsOtherSecurities | InvestmentsOtherSecuritiesGrp | InvestmentsProgramRelated | InvestmentsProgramRelatedGrp | InvestmentsPubTradedSecGrp | InvestmentsPubTradedSecurities | IsAvailable | IsElectronic | JointCosts | JointCostsInd | LandBldgEquipAccumDeprecAmt | LandBldgEquipBasisNetGrp | LandBldgEquipCostOrOtherBssAmt | LandBldgEquipmentAccumDeprec | LandBuildingsEquipmentBasis | LandBuildingsEquipmentBasisNet | LastUpdated | LegalDomicileStateCd | LessCostOthBasisSalesExpenses | LessCostOthBasisSalesExpnssGrp | LessRentalExpenses | LessRentalExpensesGrp | LicensedMoreThanOneState | LicensedMoreThanOneStateInd | LoanOutstandingInd | LoanToOfficerOrDQP | LoansFromOfficersDirectors | LoansFromOfficersDirectorsGrp | LobbyingActivities | LobbyingActivitiesInd | LocalChapters | LocalChaptersInd | MaterialDiversionOrMisuse | MaterialDiversionOrMisuseInd | MembersOrStockholders | MembersOrStockholdersInd | MembershipDues | MembershipDuesAmt | MethodOfAccountingAccrual | MethodOfAccountingAccrualInd | MethodOfAccountingCash | MethodOfAccountingCashInd | MethodOfAccountingOther | MethodOfAccountingOtherInd | MinutesOfCommittees | MinutesOfCommitteesInd | MinutesOfGoverningBody | MinutesOfGoverningBodyInd | MiscellaneousRevenue | MiscellaneousRevenueGrp | MissionDesc | MissionDescription | MoreThan5000KToIndividuals | MoreThan5000KToIndividualsInd | MoreThan5000KToOrgInd | MoreThan5000KToOrganizations | MortNotesPyblSecuredInvestProp | MortgNotesPyblScrdInvstPropGrp | NameOfForeignCountry | NameOfPrincipalOfficerBusiness | NameOfPrincipalOfficerPerson | NbrIndependentVotingMembers | NbrVotingGoverningBodyMembers | NbrVotingMembersGoverningBody | NetAssetsOrFundBalancesBOY | NetAssetsOrFundBalancesBOYAmt | NetAssetsOrFundBalancesEOY | NetAssetsOrFundBalancesEOYAmt | NetGainOrLossInvestments | NetGainOrLossInvestmentsGrp | NetIncmFromFundraisingEvtGrp | NetIncomeFromFundraisingEvents | NetIncomeFromGaming | NetIncomeFromGamingGrp | NetIncomeOrLoss | NetIncomeOrLossGrp | NetRentalIncomeOrLoss | NetRentalIncomeOrLossGrp | NetUnrelatedBusTxblIncmAmt | NetUnrelatedBusinessTxblIncome | NetUnrlzdGainsLossesInvstAmt | NoListedPersonsCompensated | NoListedPersonsCompensatedInd | NonDeductibleContributions | NonDeductibleDisclosure | NoncashContributions | NoncashContributionsAmt | NondeductibleContriDisclInd | NondeductibleContributionsInd | NumberFormsTransmittedWith1096 | NumberIndependentVotingMembers | NumberIndividualsGT100K | NumberOf8282Filed | NumberOfContractorsGT100K | NumberOfEmployees | NumberW2GIncluded | ObjectId | Occupancy | OccupancyGrp | OfficeExpenses | OfficeExpensesGrp | OfficerEntityWithBsnssRltnshp | OfficerMailingAddress | OfficerMailingAddressInd | OnBehalfOfIssuer | OnBehalfOfIssuerInd | OperateHospitalInd | OrgDoesNotFollowSFAS117Ind | OrgFiledInLieuOfForm1041Ind | Organization501c | Organization501c3 | Organization501c3Ind | Organization501cInd | OrganizationFollowsSFAS117Ind | OrganizationName | OthNotesLoansReceivableNetGrp | OtherAssetsTotal | OtherAssetsTotalGrp | OtherChangesInNetAssetsAmt | OtherEmployeeBenefits | OtherEmployeeBenefitsGrp | OtherExpensePriorYear | OtherExpenses | OtherExpensesCurrentYear | OtherExpensesGrp | OtherExplainInSchO | OtherInd | OtherLiabilities | OtherLiabilitiesGrp | OtherNotesLoansReceivableNet | OtherRevenueCurrentYear | OtherRevenueMisc | OtherRevenueMiscGrp | OtherRevenuePriorYear | OtherRevenueTotalAmt | OtherSalariesAndWages | OtherSalariesAndWagesGrp | OtherWebsite | OtherWebsiteInd | OwnWebsite | OwnWebsiteInd | PYBenefitsPaidToMembersAmt | PYContributionsGrantsAmt | PYExcessBenefitTransInd | PYGrantsAndSimilarPaidAmt | PYInvestmentIncomeAmt | PYOtherExpensesAmt | PYOtherRevenueAmt | PYProgramServiceRevenueAmt | PYRevenuesLessExpensesAmt | PYSalariesCompEmpBnftPaidAmt | PYTotalExpensesAmt | PYTotalProfFndrsngExpnsAmt | PYTotalRevenueAmt | PaidInCapSrplsLandBldgEqpFund | PartialLiquidation | PartialLiquidationInd | PayPremiumsPrsnlBnftCntrctInd | PaymentsToAffiliates | PaymentsToAffiliatesGrp | PayrollTaxes | PayrollTaxesGrp | PdInCapSrplsLandBldgEqpFundGrp | PensionPlanContributions | PensionPlanContributionsGrp | PermanentlyRestrictedNetAssets | PermanentlyRstrNetAssetsGrp | PledgesAndGrantsReceivable | PledgesAndGrantsReceivableGrp | PoliciesReferenceChapters | PoliciesReferenceChaptersInd | PoliticalActivities | PoliticalCampaignActyInd | PremiumsPaid | PrepaidExpensesDeferredCharges | PrepaidExpensesDefrdChargesGrp | PrincipalOfficerNm | PriorExcessBenefitTransaction | PriorPeriodAdjustmentsAmt | ProfessionalFundraising | ProfessionalFundraisingInd | ProgSrvcAccomActy2Grp | ProgSrvcAccomActy3Grp | ProgSrvcAccomActyOtherGrp | ProgramServiceRevenue | ProgramServiceRevenueCY | ProgramServiceRevenueGrp | ProgramServiceRevenuePriorYear | ProhibitedTaxShelterTrans | ProhibitedTaxShelterTransInd | PymtTravelEntrtnmntPubOfclGrp | QuidProQuoContriDisclInd | QuidProQuoContributions | QuidProQuoContributionsInd | QuidProQuoDisclosure | RcvFndsToPayPrsnlBnftCntrctInd | RcvblFromDisqualifiedPrsnGrp | ReceivablesFromDisqualPersons | ReceivablesFromOfficersEtc | ReceivablesFromOfficersEtcGrp | ReconcilationDonatedServices | ReconcilationInvestExpenses | ReconcilationOtherChanges | ReconcilationPriorAdjustment | ReconcilationRevenueExpenses | ReconcilationRevenueExpnssAmt | ReconciliationUnrealizedInvest | RegularMonitoringEnforcement | RegularMonitoringEnfrcInd | RelatedEntity | RelatedEntityInd | RelatedOrgControlledEntity | RelatedOrganizationCtrlEntInd | RelatedOrganizations | RelatedOrganizationsAmt | RentalIncomeOrLoss | RentalIncomeOrLossGrp | ReportFin48Footnote | ReportInvestOthSecurities | ReportInvestmentsOtherSecInd | ReportLandBldgEquip | ReportLandBuildingEquipmentInd | ReportOtherAssets | ReportOtherAssetsInd | ReportOtherLiabilities | ReportOtherLiabilitiesInd | ReportProgRelInvest | ReportProgramRelatedInvstInd | RetainedEarningsEndowmentEtc | Revenue | RevenueAmt | RevenuesLessExpensesCY | RevenuesLessExpensesPriorYear | Royalties | RoyaltiesGrp | RoyaltiesRevenue | RoyaltiesRevenueGrp | RtnEarnEndowmentIncmOthFndsGrp | SalariesEtcCurrentYear | SalariesEtcPriorYear | SavingsAndTempCashInvestments | SavingsAndTempCashInvstGrp | ScheduleBRequired | ScheduleBRequiredInd | ScheduleJRequired | ScheduleJRequiredInd | ScheduleORequired | ScheduleORequiredInd | School | SchoolOperatingInd | SignificantChange | SignificantChangeInd | SignificantNewProgramServices | SignificantNewProgramSrvcInd | StateLegalDomicile | StatesWhereCopyOfReturnIsFiled | StatesWhereCopyOfReturnIsFldCd | SubjectToProxyTax | SubjectToProxyTaxInd | SubmittedOn | TaxExemptBondLiabilities | TaxExemptBondLiabilitiesGrp | TaxExemptBonds | TaxExemptBondsInd | TaxPeriod | TaxableDistributions | TaxableDistributionsInd | TaxablePartyNotification | TaxablePartyNotificationInd | TempOrPermanentEndowmentsInd | TemporarilyRestrictedNetAssets | TemporarilyRstrNetAssetsGrp | TermOrPermanentEndowments | TerminateOperationsInd | Terminated | TerminationOrContraction | TheBooksAreInCareOf | TotLiabNetAssetsFundBalanceGrp | TotReportableCompRltdOrgAmt | TotalAssets | TotalAssetsBOY | TotalAssetsBOYAmt | TotalAssetsEOY | TotalAssetsEOYAmt | TotalAssetsGrp | TotalCompGT150K | TotalCompGreaterThan150KInd | TotalContributions | TotalContributionsAmt | TotalEmployeeCnt | TotalExpensesCurrentYear | TotalExpensesPriorYear | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | TotalFundrsngExpCurrentYear | TotalGrossUBI | TotalGrossUBIAmt | TotalJointCosts | TotalJointCostsGrp | TotalLiabNetAssetsFundBalances | TotalLiabilities | TotalLiabilitiesBOY | TotalLiabilitiesBOYAmt | TotalLiabilitiesEOY | TotalLiabilitiesEOYAmt | TotalLiabilitiesGrp | TotalNbrEmployees | TotalNbrVolunteers | TotalNetAssetsFundBalanceGrp | TotalNetAssetsFundBalances | TotalOfOtherProgramServiceExp | TotalOfOtherProgramServiceGrnt | TotalOfOtherProgramServiceRev | TotalOthProgramServiceRevGrp | TotalOthProgramServiceRevenue | TotalOtherCompensation | TotalOtherCompensationAmt | TotalOtherProgSrvcExpenseAmt | TotalOtherProgSrvcGrantAmt | TotalOtherProgSrvcRevenueAmt | TotalOtherRevenue | TotalProfFundrsngExpCY | TotalProfFundrsngExpPriorYear | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | TotalProgramServiceRevenue | TotalProgramServiceRevenueAmt | TotalReportableCompFrmRltdOrgs | TotalReportableCompFromOrg | TotalReportableCompFromOrgAmt | TotalRevenue | TotalRevenueCurrentYear | TotalRevenueGrp | TotalRevenuePriorYear | TotalVolunteersCnt | TransactionRelatedEntity | TransactionWithControlEntInd | TransfersToExemptNonChrtblOrg | Travel | TravelEntrtnmntPublicOfficials | TravelGrp | TrnsfrExmptNonChrtblRltdOrgInd | TypeOfOrganizationCorpInd | TypeOfOrganizationCorporation | TypeOfOrganizationOther | TypeOfOrganizationOtherInd | TypeOfOrganizationTrust | TypeOfOrganizationTrustInd | URL | USAddress | UnrelatedBusIncmOverLimitInd | UnrelatedBusinessIncome | UnrestrictedNetAssets | UnrestrictedNetAssetsGrp | UnsecuredNotesLoansPayable | UnsecuredNotesLoansPayableGrp | UponRequest | UponRequestInd | VotingMembersGoverningBodyCnt | VotingMembersIndependentCnt | WebSite | WebsiteAddressTxt | WhistleblowerPolicy | WhistleblowerPolicyInd | WrittenPolicyOrProcedure | WrittenPolicyOrProcedureInd | YearFormation | _id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | RetDoc1 | RetDoc2 | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | RAISE MONIES TO CONSTRUCT A FACILITY FOR YEAR-ROUND RECREATION PROGRAM DEDICATED TO CHILDREN AND ADULTS WITH SPECIAL NEEDS WHO RESIDE IN THE TOWN OF HEMPSTEAD. | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'PhoneNum': u'5164316946', u'USAddress': {u'CityNm': u'LIDO BEACH', u'StateAbbreviationCd': u'NY', u'ZIPCd': u'11561', u'AddressLine1Txt': u'630 LIDO BLVD'}, u'BusinessName': {u'BusinessNameLine1Txt': u'ANCHOR BUILDING FUND INC'}} | NaN | NaN | false | false | false | 0 | 0 | 0 | 0 | 34400 | 325000 | 0 | 290600 | 0 | 34400 | 0 | 0 | 325000 | NaN | NaN | NaN | {u'EOYAmt': u'290600'} | false | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | 93493316047805 | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | false | NaN | NaN | THE ORGANIZATIONS EXEMPT PURPOSE WAS TO RAISE MONIES TO CONSTRUCT A FACILTIY TO OPERATE ITS YEAR-ROUND PROGRAM. | NaN | {u'#text': u'true', u'@referenceDocumentId': u'RetDoc3'} | NaN | NaN | false | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | 271317072 | NaN | false | 0 | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalAmt': u'9400', u'ManagementAndGeneralAmt': u'9400'} | NaN | NaN | {u'TotalAmt': u'25000', u'ManagementAndGeneralAmt': u'25000'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | NaN | [{u'PersonNm': u'JOHN MALLOY', u'TitleTxt': u'PRESIDENT', u'AverageHoursPerWeekRt': u'2.00', u'ReportableCompFromRltdOrgAmt': u'0', u'OtherCompensationAmt': u'0', u'OfficerInd': u'X', u'ReportableCompFromOrgAmt': u'0'}, {u'PersonNm': u'ANNA DRAGO... | NaN | false | NaN | 990 | 2009 | false | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | 2 | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | 325000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | 2 | 0 | false | NaN | NaN | NaN | false | 2 | NaN | NaN | false | NaN | X | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | X | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | True | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2016-03-21T17:23:53 | NY | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | false | NaN | false | NaN | true | NaN | false | NaN | NaN | NaN | NaN | NaN | X | NaN | NaN | NaN | false | NaN | false | NaN | NaN | RAISE MONIES TO CONSTRUCT A FACILITY FOR YEAR-ROUND RECREATION PROGRAM DEDICATED TO CHILDREN AND ADULTS WITH SPECIAL NEEDS WHO RESIDE IN THE TOWN OF HEMPSTEAD. | NaN | NaN | false | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 290600 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | X | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 201503169349304780 | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | false | NaN | NaN | NaN | NaN | X | NaN | X | ANCHOR BUILDING FUND INC | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000', u'Desc': u'SETTLEMENT'} | NaN | 325000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | JOHN MALLOY | NaN | NaN | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 290600 | NaN | NaN | NaN | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | false | NaN | false | NaN | false | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | NaN | false | NaN | true | NaN | false | NaN | true | NaN | false | NaN | NaN | NY | NaN | false | 2016-01-30 | NaN | NaN | NaN | false | 201412 | NaN | NaN | NaN | false | false | NaN | NaN | NaN | false | NaN | NaN | NaN | {u'BOYAmt': u'0', u'EOYAmt': u'290600'} | NaN | NaN | NaN | NaN | NaN | 290600 | {u'BOYAmt': u'0', u'EOYAmt': u'290600'} | NaN | false | NaN | NaN | 0 | NaN | NaN | NaN | {u'TotalAmt': u'34400', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'34400', u'FundraisingAmt': u'0'} | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | {u'BOYAmt': u'0', u'EOYAmt': u'0'} | NaN | NaN | {u'BOYAmt': u'0', u'EOYAmt': u'290600'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {u'TotalRevenueColumnAmt': u'325000', u'ExclusionAmt': u'325000'} | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | false | X | NaN | NaN | NaN | NaN | NaN | https://s3.amazonaws.com/irs-form-990/201503169349304780_public.xml | {u'CityNm': u'LIDO BEACH', u'StateAbbreviationCd': u'NY', u'ZIPCd': u'11561', u'AddressLine1Txt': u'630 LIDO BLVD'} | false | NaN | NaN | {u'EOYAmt': u'290600'} | NaN | NaN | NaN | X | 2 | 2 | NaN | WWW.CAMPANCHOR.ORG | NaN | false | NaN | NaN | NaN | 57cce7823ffc5a55fe7f54b5 |
#efficiency_columns = ['EIN', 'TaxPeriod', 'CYTotalExpensesAmt', 'TotalExpensesCurrentYear',
# 'TotalProgramServiceExpense', 'TotalProgramServiceExpensesAmt',
# 'TotalFunctionalExpenses', 'TotalFunctionalExpensesGrp']
#df2 = df2.sort_values(by=['EIN'], ascending=[1])
#df2[df2['EIN'].isin(zero_prog_exp)][efficiency_columns]
EIN | TaxPeriod | CYTotalExpensesAmt | TotalExpensesCurrentYear | TotalProgramServiceExpense | TotalProgramServiceExpensesAmt | TotalFunctionalExpenses | TotalFunctionalExpensesGrp | |
---|---|---|---|---|---|---|---|---|
404 | 132644507 | 201312 | 7795021 | NaN | NaN | NaN | NaN | {u'TotalAmt': u'7795021'} |
106 | 132644507 | 201112 | NaN | 5880789 | NaN | NaN | {u'Total': u'5880789'} | NaN |
431 | 132644507 | 201212 | NaN | 6237509 | NaN | NaN | {u'Total': u'6237509'} | NaN |
172 | 132644507 | 201412 | 7763846 | NaN | NaN | NaN | NaN | {u'TotalAmt': u'7763846'} |
72 | 132644507 | 201012 | NaN | 6024086 | NaN | NaN | {u'Total': u'6024086'} | NaN |
52 | 135579302 | 201412 | 2023750 | NaN | NaN | NaN | NaN | {u'TotalAmt': u'2023750'} |
406 | 135579302 | 201312 | 4133761 | NaN | NaN | NaN | NaN | {u'TotalAmt': u'4133761', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'0', u'FundraisingAmt': u'0'} |
500 | 201737542 | 201006 | NaN | 177901 | NaN | NaN | {u'ManagementAndGeneral': u'111025', u'ProgramServices': u'0', u'Fundraising': u'66876', u'Total': u'177901'} | NaN |
0 | 271317072 | 201412 | 34400 | NaN | NaN | NaN | NaN | {u'TotalAmt': u'34400', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'34400', u'FundraisingAmt': u'0'} |
339 | 330837780 | 201312 | 2151741 | NaN | NaN | 2151482 | NaN | {u'TotalAmt': u'2151741', u'ProgramServicesAmt': u'2151482', u'ManagementAndGeneralAmt': u'259', u'FundraisingAmt': u'0'} |
345 | 330837780 | 201212 | NaN | 211124 | 211124 | NaN | {u'ManagementAndGeneral': u'0', u'ProgramServices': u'211124', u'Fundraising': u'0', u'Total': u'211124'} | NaN |
216 | 330837780 | 201112 | NaN | 28538 | 26728 | NaN | {u'ManagementAndGeneral': u'1810', u'ProgramServices': u'26728', u'Fundraising': u'0', u'Total': u'28538'} | NaN |
277 | 330837780 | 201412 | 3330 | NaN | NaN | NaN | NaN | {u'TotalAmt': u'3330', u'ProgramServicesAmt': u'0', u'ManagementAndGeneralAmt': u'3330', u'FundraisingAmt': u'0'} |
213 | 591951577 | 201012 | NaN | 1161939 | NaN | NaN | {u'Total': u'1161939'} | NaN |
226 | 591951577 | 201112 | NaN | 1690623 | NaN | NaN | {u'Total': u'1690623'} | NaN |
495 | 591951577 | 201012 | NaN | 1282149 | NaN | NaN | {u'Total': u'1282149'} | NaN |
496 | 953553530 | 201112 | NaN | 0 | NaN | NaN | {u'ManagementAndGeneral': u'0', u'ProgramServices': u'0', u'Fundraising': u'0', u'Total': u'0'} | NaN |
zero_prog_exp = ['132644507', '135579302', '201737542', '271317072', '330837780', '591951577', '953553530']
df[df['EIN'].isin(zero_prog_exp)][cols2]
EIN | FYE | total_expenses | program_expenses | program_efficiency | |
---|---|---|---|---|---|
172 | 132644507 | FY2014 | 7763846.0 | NaN | NaN |
404 | 132644507 | FY2013 | 7795021.0 | NaN | NaN |
431 | 132644507 | FY2012 | 6237509.0 | NaN | NaN |
106 | 132644507 | FY2011 | 5880789.0 | NaN | NaN |
72 | 132644507 | FY2010 | 6024086.0 | NaN | NaN |
52 | 135579302 | FY2014 | 2023750.0 | NaN | NaN |
406 | 135579302 | FY2013 | 4133761.0 | NaN | NaN |
500 | 201737542 | FY2010 | 177901.0 | NaN | NaN |
0 | 271317072 | FY2014 | 34400.0 | NaN | NaN |
277 | 330837780 | FY2014 | 3330.0 | NaN | NaN |
339 | 330837780 | FY2013 | 2151741.0 | 2151482.0 | 0.999880 |
345 | 330837780 | FY2012 | 211124.0 | 211124.0 | 1.000000 |
216 | 330837780 | FY2011 | 28538.0 | 26728.0 | 0.936576 |
226 | 591951577 | FY2011 | 1690623.0 | NaN | NaN |
213 | 591951577 | FY2010 | 1161939.0 | NaN | NaN |
495 | 591951577 | FY2010 | 1282149.0 | NaN | NaN |
496 | 953553530 | FY2011 | 0.0 | NaN | NaN |
print len(df[df['program_efficiency'].isnull()])
print len(df[df['total_expenses'].isnull()])
print len(df[df['program_expenses'].isnull()])
print len(df[df['program_efficiency'].isnull()])
14 0 14 14
df['program_efficiency'] = np.where(df['program_efficiency'].isnull(), 0, df['program_efficiency'])
print len(df[df['program_efficiency'].isnull()])
0
print len(df[df['tot_rev']==0])
print len(df[df['tot_rev']<0])
df['tot_rev_no_neg'] = df['tot_rev']
df['tot_rev_no_neg'] = np.where(df['tot_rev_no_neg']<=0, 1, df['tot_rev_no_neg'])
print len(df[df['tot_rev_no_neg']==0])
print len(df[df['tot_rev_no_neg']<0])
1 2 0 0
df['total_revenue_logged'] = np.log(df['tot_rev_no_neg'])
file_list_columns = ['EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod', 'FYE']
SOX_columns = ['whistleblower_policy', 'conflict_of_interest_policy', 'records_retention_policy',
'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']
control_columns = ['tot_rev', 'tot_rev_no_neg', 'total_revenue_logged', 'total_expenses', 'program_expenses',
'program_efficiency', 'complexity']
unneeded_columns = ['CYTotalExpensesAmt', u'TotalExpensesCurrentYear', u'TotalFunctionalExpenses',
'TotalFunctionalExpensesGrp', u'TotalProgramServiceExpense', u'TotalProgramServiceExpensesAmt',
'CYContributionsGrantsAmt', u'FederatedCampaigns', u'FederatedCampaignsAmt', u'MembershipDues',
'MembershipDuesAmt', u'NetIncomeFromFundraisingEvents', u'NetIncmFromFundraisingEvtGrp',
'FundraisingAmt', 'FundraisingGrossIncomeAmt', u'RelatedOrganizations',
'RelatedOrganizationsAmt', u'GovernmentGrants', 'GovernmentGrantsAmt',
'PYProgramServiceRevenueAmt', u'CYProgramServiceRevenueAmt', 'ProgramServiceRevenue',
'ProgramServiceRevenueCY', u'ProgramServiceRevenueGrp', u'TotalProgramServiceRevenue',
'TotalProgramServiceRevenueAmt', u'CYInvestmentIncomeAmt', u'InvestmentIncome',
'InvestmentIncomeCurrentYear', 'InvestmentIncomeGrp', u'OtherRevenueTotalAmt',
'OtherRevenueMiscGrp', u'TotalOtherRevenue', 'CYOtherRevenueAmt', 'fundraising',
'program_revenue', 'OtherRevMisc', 'other_revenue_binary']
cols = file_list_columns+SOX_columns+control_columns
df = df[cols]
df['whistleblower_policy'] = df['whistleblower_policy'].astype('int')
df['conflict_of_interest_policy'] = df['conflict_of_interest_policy'].astype('int')
df['records_retention_policy'] = df['records_retention_policy'].astype('int')
df[:10]
EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | FYE | whistleblower_policy | conflict_of_interest_policy | records_retention_policy | SOX_policies | SOX_policies_binary | SOX_policies_all_binary | tot_rev | tot_rev_no_neg | total_revenue_logged | total_expenses | program_expenses | program_efficiency | complexity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 201212 | FY2012 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 2515399 | 2515399 | 14.737942 | 2100922.0 | 1598626.0 | 0.760916 | 2 |
433 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 201112 | FY2011 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 2161209 | 2161209 | 14.586178 | 2481310.0 | 1715793.0 | 0.691487 | 0 |
78 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201220909349300327_public.xml | 2012-05-25 | 201012 | FY2010 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 3990564 | 3990564 | 15.199443 | 3912643.0 | 2930785.0 | 0.749055 | 0 |
333 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201502299349301700_public.xml | 2015-09-21 | 201409 | FY2014 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 22995526 | 22995526 | 16.950810 | 25626262.0 | 18485740.0 | 0.721359 | 4 |
12 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201442269349304504_public.xml | 2014-10-28 | 201309 | FY2013 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 25923449 | 25923449 | 17.070658 | 26224173.0 | 19280493.0 | 0.735218 | 3 |
481 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201312269349301491_public.xml | 2013-12-17 | 201209 | FY2012 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 28579592 | 28579592 | 17.168203 | 25481942.0 | 19252739.0 | 0.755544 | 4 |
16 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201212219349301091_public.xml | 2012-08-16 | 201106 | FY2011 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 26410002 | 26410002 | 17.089253 | 28367393.0 | 22102327.0 | 0.779146 | 3 |
130 | 042129889 | SCHEPENS EYE RESEARCH INSTITUTE INC | https://s3.amazonaws.com/irs-form-990/201121199349301122_public.xml | 2011-05-04 | 201006 | FY2010 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 28493155 | 28493155 | 17.165174 | 29094188.0 | 24121623.0 | 0.829087 | 3 |
434 | 042701694 | WESTFIELD STATE FOUNDATION INC | https://s3.amazonaws.com/irs-form-990/201403119349300995_public.xml | 2014-12-03 | 201406 | FY2014 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 764437 | 764437 | 13.546895 | 636346.0 | 478219.0 | 0.751508 | 3 |
374 | 042701694 | WESTFIELD STATE FOUNDATION INC | https://s3.amazonaws.com/irs-form-990/201401339349305050_public.xml | 2014-06-27 | 201306 | FY2013 | 1 | 1 | 1 | 3.0 | 1.0 | 1.0 | 745204 | 745204 | 13.521413 | 766452.0 | 446064.0 | 0.581986 | 2 |
df.describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
whistleblower_policy | 538.0 | 5.892193e-01 | 4.924334e-01 | 0.0 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 |
conflict_of_interest_policy | 538.0 | 8.438662e-01 | 3.633200e-01 | 0.0 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 |
records_retention_policy | 538.0 | 7.118959e-01 | 4.533013e-01 | 0.0 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 |
SOX_policies | 538.0 | 2.144981e+00 | 1.097391e+00 | 0.0 | 1.000000e+00 | 3.000000e+00 | 3.000000e+00 | 3.000000e+00 |
SOX_policies_binary | 538.0 | 8.605948e-01 | 3.466912e-01 | 0.0 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 |
SOX_policies_all_binary | 538.0 | 5.446097e-01 | 4.984695e-01 | 0.0 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 |
tot_rev | 538.0 | 5.857063e+07 | 3.234413e+08 | -218265025.0 | 1.320586e+06 | 4.261394e+06 | 1.334163e+07 | 3.741635e+09 |
tot_rev_no_neg | 538.0 | 5.897644e+07 | 3.232302e+08 | 1.0 | 1.320586e+06 | 4.261394e+06 | 1.334163e+07 | 3.741635e+09 |
total_revenue_logged | 538.0 | 1.524967e+01 | 2.287752e+00 | 0.0 | 1.409358e+01 | 1.526510e+01 | 1.640638e+01 | 2.204279e+01 |
total_expenses | 538.0 | 5.632274e+07 | 3.003909e+08 | 0.0 | 1.375426e+06 | 4.421644e+06 | 1.285109e+07 | 3.287631e+09 |
program_expenses | 524.0 | 4.994483e+07 | 2.653191e+08 | 3884.0 | NaN | NaN | NaN | 2.884879e+09 |
program_efficiency | 538.0 | 7.589501e-01 | 2.257745e-01 | 0.0 | 6.945523e-01 | 8.398745e-01 | 9.007387e-01 | 1.000000e+00 |
complexity | 538.0 | 2.431227e+00 | 1.278642e+00 | 0.0 | 2.000000e+00 | 2.000000e+00 | 3.000000e+00 | 7.000000e+00 |
df.to_pickle('e-file 990s for 2016 donor advisory organizations, v4 (key columns only).pkl')
df.to_excel('e-file 990s for 2016 donor advisory organizations, v4 (key columns only).xls')
df[:2]
EIN | OrganizationName | URL | SubmittedOn | TaxPeriod | FYE | whistleblower_policy | conflict_of_interest_policy | records_retention_policy | SOX_policies | SOX_policies_binary | SOX_policies_all_binary | tot_rev | tot_rev_no_neg | total_revenue_logged | total_expenses | program_expenses | program_efficiency | complexity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
252 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml | 2013-12-31 | 201212 | FY2012 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 2515399 | 2515399 | 14.737942 | 2100922.0 | 1598626.0 | 0.760916 | 2 |
433 | 030498214 | NEWARK NOW INC | https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml | 2013-02-14 | 201112 | FY2011 | 0 | 1 | 1 | 2.0 | 1.0 | 0.0 | 2161209 | 2161209 | 14.586178 | 2481310.0 | 1715793.0 | 0.691487 | 0 |
missing_47 = pd.read_excel('47 missing SOX_updated.xls')
print len(missing_47)
missing_47.head()
47
CN org_id | EIN | FYE (choose latest available on CN) | donor_advisory | conflict_of_interest | records_retention | whistleblower | complexity | total_revenue | program_efficiency | program_expense | total_expense | Unnamed: 12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10087 | 581925867 | 2015 | 1 | 1.0 | 0.0 | 0.0 | 5.0 | 445953.0 | NaN | 229316.0 | 370526.0 | NaN |
1 | 10552 | 942719901 | 2012 | 1 | 1.0 | 0.0 | 1.0 | 3.0 | 3935913.0 | NaN | 3842824.0 | 4134682.0 | NaN |
2 | 10902 | 262224994 | 2014 | 1 | 1.0 | 1.0 | 1.0 | 3.0 | 706895.0 | NaN | 356046.0 | 565973.0 | NaN |
3 | 11009 | 953523852 | 2014 | 1 | 1.0 | 1.0 | 1.0 | 2.0 | 3620634.0 | NaN | 490708.0 | 3455917.0 | NaN |
4 | 11327 | 720760857 | 2014 | 1 | 1.0 | 1.0 | 1.0 | 5.0 | 4244456.0 | NaN | 4210946.0 | 4942239.0 | NaN |
missing_47_EINs = missing_47['EIN'].tolist()
missing_47_EINs = [str(x) for x in missing_47_EINs]
print len(missing_47_EINs)
print len(set(missing_47_EINs))
missing_47_EINs[:5]
47 47
['581925867', '942719901', '262224994', '953523852', '720760857']
42 of the 47 EINs are in the e-file database.
print len(df[df['EIN'].isin(missing_47_EINs)])
print len(df[df['EIN'].isin(missing_47_EINs)]['EIN'].tolist())
print len(set(df[df['EIN'].isin(missing_47_EINs)]['EIN'].tolist()))
164 164 42