[1]:
from typing import List
from arcgis.features import GeoAccessor
from arcgis.gis import GIS
import cenpy
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
Economic distress index#
Economic distress index using 2017-2021 American Community Survey (ACS) 5-year county level estimates. The index will include the following variables:
Percent households with $30,000 or less income
Percent population with less than a High School degree for population 25 Years and Over
Percent population living below poverty in the past 12 months
Percent unemployed for the population 16 Years and Over
Percent households with cash public assistance or Food Stamps/SNAP
Percent households with Supplemental Security Income (SSI)
Connect to the Census Bureau API#
[2]:
api_database = 'ACSDT5Y2021'
Explain Data Being Accessed#
We can retrieve a through description of the data using the cenpy.explorer.explain method.
[3]:
ex = cenpy.explorer.explain(api_database)
print([k for k in ex.keys()][0])
print('\n', [v for v in ex.values()][0])
American Community Survey: 5-Year Estimates: Detailed Tables 5-Year
The American Community Survey (ACS) is an ongoing survey that provides data every year -- giving communities the current information they need to plan investments and services. The ACS covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population. Summary files include the following geographies: nation, all states (including DC and Puerto Rico), all metropolitan areas, all congressional districts, all counties, all places, and all tracts and block groups. Summary files contain the most detailed cross-tabulations, many of which are published down to block groups. The data are population and housing counts. There are over 64,000 variables in this dataset.
Create APIConnection
Object Instance#
The APIConnection object is going to be the way we interrogate and retrieve data from the US Census Bureau.
[4]:
api_conn = cenpy.remote.APIConnection(api_database)
api_conn
[4]:
Connection to American Community Survey: 5-Year Estimates: Detailed Tables 5-Year (ID: https://api.census.gov/data/id/ACSDT5Y2021)
Retrieve Data#
Our first step is retrieving data from the US Census Bureau. We are going to start with some data introspection, and then retrieve the data to use later for calculating our composite index.
List of Variables to Retrieve#
Although quite a few more are available, here is the list of variables we are going to retrieve.
[5]:
req_vars = [
'B19001_001', # full households
'B19001_002' , # less than 10k
'B19001_003' , # between 10-14k
'B19001_004' , # between 15-19k
'B19001_005' , # between 20-24k
'B19001_006' , # between 25-29k
'B19001B_001', # african american total households
'B19001B_002' , # less than 10k
'B19001B_003' , # between 10-14k
'B19001B_004' , # between 15-19k
'B19001B_005' , # between 20-24k
'B19001B_006' , # between 25-29k
'B19001C_001' , # american indian and alaska native total households
'B19001C_002' , # less than 10k
'B19001C_003' , # between 10-14k
'B19001C_004' , # between 15-19k
'B19001C_005' , # between 20-24k
'B19001C_006' , # between 25-29k
'B19001D_001' , # asian total households
'B19001D_002' , # less than 10k
'B19001D_003' , # between 10-14k
'B19001D_004' , # between 15-19k
'B19001D_005' , # between 20-24k
'B19001D_006' , # between 25-29k
'B19001E_001' , # native hawaiian or pacific islander total households
'B19001E_002' , # less than 10k
'B19001E_003' , # between 10-14k
'B19001E_004' , # between 15-19k
'B19001E_005' , # between 20-24k
'B19001E_006' , # between 25-29k
'B19001H_001' , # nonhispanic white total households
'B19001H_002' , # less than 10k
'B19001H_003' , # between 10-14k
'B19001H_004' , # between 15-19k
'B19001H_005' , # between 20-24k
'B19001H_006' , # between 25-29k
'B19001I_001' , # hispanic or latino total households
'B19001I_002' , # less than 10k
'B19001I_003' , # between 10-14k
'B19001I_004' , # between 15-19k
'B19001I_005' , # between 20-24k
'B19001I_006' , # between 25-29k
'B15003_001' , #education attainment, total, for pop 25 years and over
'B15003_025', # Doctorate degree
'B15003_024', # Professional school degree
'B15003_023', # Master's degree
'B15003_022', # Bachelor's degree
'B15003_021', # Associate's degree
'B15003_020', # some college- 1 or more year
'B15003_019', # some college- less than 1 year
'B15003_018', # GED or alternative credential
'B15003_017', # high school diploma
'B17001_001', # Poverty status in the past 12 months
'B17001_002', # Income in the past 12 months below poverty level
'B23025_002', # in Labor Force - for the population 16 years and over
'B23025_005', # Unemployed
'B19058_002', # with cash public assistance or Food Stamps/SNAP
'B19058_001', #Total estimate
'B19056_002', # with Supplemental Security Income (SSI)
'B19056_001', #Total estimate
]
Add E
to Designate Estimate Variables#
When retrieving variables, since we are interested in variable estimates, we append an E
to the end of all the variables we are going to retrieve.
[6]:
req_vars = [f'{var}E' for var in req_vars]
print(req_vars)
['B19001_001E', 'B19001_002E', 'B19001_003E', 'B19001_004E', 'B19001_005E', 'B19001_006E', 'B19001B_001E', 'B19001B_002E', 'B19001B_003E', 'B19001B_004E', 'B19001B_005E', 'B19001B_006E', 'B19001C_001E', 'B19001C_002E', 'B19001C_003E', 'B19001C_004E', 'B19001C_005E', 'B19001C_006E', 'B19001D_001E', 'B19001D_002E', 'B19001D_003E', 'B19001D_004E', 'B19001D_005E', 'B19001D_006E', 'B19001E_001E', 'B19001E_002E', 'B19001E_003E', 'B19001E_004E', 'B19001E_005E', 'B19001E_006E', 'B19001H_001E', 'B19001H_002E', 'B19001H_003E', 'B19001H_004E', 'B19001H_005E', 'B19001H_006E', 'B19001I_001E', 'B19001I_002E', 'B19001I_003E', 'B19001I_004E', 'B19001I_005E', 'B19001I_006E', 'B15003_001E', 'B15003_025E', 'B15003_024E', 'B15003_023E', 'B15003_022E', 'B15003_021E', 'B15003_020E', 'B15003_019E', 'B15003_018E', 'B15003_017E', 'B17001_001E', 'B17001_002E', 'B23025_002E', 'B23025_005E', 'B19058_002E', 'B19058_001E', 'B19056_002E', 'B19056_001E']
Get Explanation for Variables#
One of the nice things about the cenpy
package is the ability to retrieve intelligible desriptions of the coded variables available. Here, we can retrieve the descriptions associated with the variables we are retrieving for reference.
[7]:
label_df = api_conn.variables[api_conn.variables.index.isin(req_vars)]
label_series = label_df['label'].str.replace('!!', ' ') + ' / ' + label_df['concept']
for itm in label_series.iteritems():
print(f'{itm[0]} -> {itm[1]}'.replace('!!', ' '))
print('Denomentar Value')
for itm in label_series.iteritems():
print(f'{itm[0]} -> {itm[1]}'.replace('!!', ' '))
B19001B_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER)
B19001B_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER)
B19001B_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER)
B19001B_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER)
B19001B_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER)
B19001B_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER)
B19058_001E -> Estimate Total: / PUBLIC ASSISTANCE INCOME OR FOOD STAMPS/SNAP IN THE PAST 12 MONTHS FOR HOUSEHOLDS
B19058_002E -> Estimate Total: With cash public assistance or Food Stamps/SNAP / PUBLIC ASSISTANCE INCOME OR FOOD STAMPS/SNAP IN THE PAST 12 MONTHS FOR HOUSEHOLDS
B19001H_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE, NOT HISPANIC OR LATINO HOUSEHOLDER)
B19001H_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE, NOT HISPANIC OR LATINO HOUSEHOLDER)
B19001H_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE, NOT HISPANIC OR LATINO HOUSEHOLDER)
B19001H_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE, NOT HISPANIC OR LATINO HOUSEHOLDER)
B19001H_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE, NOT HISPANIC OR LATINO HOUSEHOLDER)
B19001H_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE, NOT HISPANIC OR LATINO HOUSEHOLDER)
B19001C_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (AMERICAN INDIAN AND ALASKA NATIVE ALONE HOUSEHOLDER)
B19001C_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (AMERICAN INDIAN AND ALASKA NATIVE ALONE HOUSEHOLDER)
B19001C_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (AMERICAN INDIAN AND ALASKA NATIVE ALONE HOUSEHOLDER)
B19001C_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (AMERICAN INDIAN AND ALASKA NATIVE ALONE HOUSEHOLDER)
B19001C_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (AMERICAN INDIAN AND ALASKA NATIVE ALONE HOUSEHOLDER)
B19001C_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (AMERICAN INDIAN AND ALASKA NATIVE ALONE HOUSEHOLDER)
B17001_001E -> Estimate Total: / POVERTY STATUS IN THE PAST 12 MONTHS BY SEX BY AGE
B17001_002E -> Estimate Total: Income in the past 12 months below poverty level: / POVERTY STATUS IN THE PAST 12 MONTHS BY SEX BY AGE
B19001I_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (HISPANIC OR LATINO HOUSEHOLDER)
B19001I_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (HISPANIC OR LATINO HOUSEHOLDER)
B19001I_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (HISPANIC OR LATINO HOUSEHOLDER)
B19001I_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (HISPANIC OR LATINO HOUSEHOLDER)
B19001I_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (HISPANIC OR LATINO HOUSEHOLDER)
B19001I_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (HISPANIC OR LATINO HOUSEHOLDER)
B19001D_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (ASIAN ALONE HOUSEHOLDER)
B19001D_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (ASIAN ALONE HOUSEHOLDER)
B19001D_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (ASIAN ALONE HOUSEHOLDER)
B19001D_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (ASIAN ALONE HOUSEHOLDER)
B19001D_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (ASIAN ALONE HOUSEHOLDER)
B19001D_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (ASIAN ALONE HOUSEHOLDER)
B23025_005E -> Estimate Total: In labor force: Civilian labor force: Unemployed / EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER
B23025_002E -> Estimate Total: In labor force: / EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER
B15003_025E -> Estimate Total: Doctorate degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_024E -> Estimate Total: Professional school degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_023E -> Estimate Total: Master's degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_022E -> Estimate Total: Bachelor's degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_021E -> Estimate Total: Associate's degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_020E -> Estimate Total: Some college, 1 or more years, no degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_001E -> Estimate Total: / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_018E -> Estimate Total: GED or alternative credential / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_019E -> Estimate Total: Some college, less than 1 year / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_017E -> Estimate Total: Regular high school diploma / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B19001_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001E_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE HOUSEHOLDER)
B19001E_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE HOUSEHOLDER)
B19001E_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE HOUSEHOLDER)
B19001E_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE HOUSEHOLDER)
B19001E_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE HOUSEHOLDER)
B19001E_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE HOUSEHOLDER)
B19056_001E -> Estimate Total: / SUPPLEMENTAL SECURITY INCOME (SSI) IN THE PAST 12 MONTHS FOR HOUSEHOLDS
B19056_002E -> Estimate Total: With Supplemental Security Income (SSI) / SUPPLEMENTAL SECURITY INCOME (SSI) IN THE PAST 12 MONTHS FOR HOUSEHOLDS
Denomentar Value
B19001B_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER)
B19001B_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER)
B19001B_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER)
B19001B_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER)
B19001B_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER)
B19001B_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER)
B19058_001E -> Estimate Total: / PUBLIC ASSISTANCE INCOME OR FOOD STAMPS/SNAP IN THE PAST 12 MONTHS FOR HOUSEHOLDS
B19058_002E -> Estimate Total: With cash public assistance or Food Stamps/SNAP / PUBLIC ASSISTANCE INCOME OR FOOD STAMPS/SNAP IN THE PAST 12 MONTHS FOR HOUSEHOLDS
B19001H_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE, NOT HISPANIC OR LATINO HOUSEHOLDER)
B19001H_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE, NOT HISPANIC OR LATINO HOUSEHOLDER)
B19001H_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE, NOT HISPANIC OR LATINO HOUSEHOLDER)
B19001H_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE, NOT HISPANIC OR LATINO HOUSEHOLDER)
B19001H_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE, NOT HISPANIC OR LATINO HOUSEHOLDER)
B19001H_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE, NOT HISPANIC OR LATINO HOUSEHOLDER)
B19001C_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (AMERICAN INDIAN AND ALASKA NATIVE ALONE HOUSEHOLDER)
B19001C_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (AMERICAN INDIAN AND ALASKA NATIVE ALONE HOUSEHOLDER)
B19001C_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (AMERICAN INDIAN AND ALASKA NATIVE ALONE HOUSEHOLDER)
B19001C_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (AMERICAN INDIAN AND ALASKA NATIVE ALONE HOUSEHOLDER)
B19001C_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (AMERICAN INDIAN AND ALASKA NATIVE ALONE HOUSEHOLDER)
B19001C_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (AMERICAN INDIAN AND ALASKA NATIVE ALONE HOUSEHOLDER)
B17001_001E -> Estimate Total: / POVERTY STATUS IN THE PAST 12 MONTHS BY SEX BY AGE
B17001_002E -> Estimate Total: Income in the past 12 months below poverty level: / POVERTY STATUS IN THE PAST 12 MONTHS BY SEX BY AGE
B19001I_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (HISPANIC OR LATINO HOUSEHOLDER)
B19001I_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (HISPANIC OR LATINO HOUSEHOLDER)
B19001I_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (HISPANIC OR LATINO HOUSEHOLDER)
B19001I_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (HISPANIC OR LATINO HOUSEHOLDER)
B19001I_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (HISPANIC OR LATINO HOUSEHOLDER)
B19001I_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (HISPANIC OR LATINO HOUSEHOLDER)
B19001D_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (ASIAN ALONE HOUSEHOLDER)
B19001D_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (ASIAN ALONE HOUSEHOLDER)
B19001D_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (ASIAN ALONE HOUSEHOLDER)
B19001D_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (ASIAN ALONE HOUSEHOLDER)
B19001D_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (ASIAN ALONE HOUSEHOLDER)
B19001D_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (ASIAN ALONE HOUSEHOLDER)
B23025_005E -> Estimate Total: In labor force: Civilian labor force: Unemployed / EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER
B23025_002E -> Estimate Total: In labor force: / EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER
B15003_025E -> Estimate Total: Doctorate degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_024E -> Estimate Total: Professional school degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_023E -> Estimate Total: Master's degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_022E -> Estimate Total: Bachelor's degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_021E -> Estimate Total: Associate's degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_020E -> Estimate Total: Some college, 1 or more years, no degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_001E -> Estimate Total: / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_018E -> Estimate Total: GED or alternative credential / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_019E -> Estimate Total: Some college, less than 1 year / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_017E -> Estimate Total: Regular high school diploma / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B19001_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001E_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE HOUSEHOLDER)
B19001E_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE HOUSEHOLDER)
B19001E_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE HOUSEHOLDER)
B19001E_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE HOUSEHOLDER)
B19001E_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE HOUSEHOLDER)
B19001E_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE HOUSEHOLDER)
B19056_001E -> Estimate Total: / SUPPLEMENTAL SECURITY INCOME (SSI) IN THE PAST 12 MONTHS FOR HOUSEHOLDS
B19056_002E -> Estimate Total: With Supplemental Security Income (SSI) / SUPPLEMENTAL SECURITY INCOME (SSI) IN THE PAST 12 MONTHS FOR HOUSEHOLDS
Make Request to Get Data#
Now, finally, we are going to retrieve the actual data using the APIConnection.query method.
If you notice below, once we retrieve the variables we are interested in, unfortunately, all the scalar values are returned as strings (object
).
[8]:
req_cols = ['NAME' , 'GEO_ID'] + req_vars
df = api_conn.query(
cols=req_cols,
geo_unit='county',
geo_filter={'state': '*'}
)
df.info()
df.head()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3221 entries, 0 to 3220
Data columns (total 64 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 NAME 3221 non-null object
1 GEO_ID 3221 non-null object
2 B19001_001E 3221 non-null object
3 B19001_002E 3221 non-null object
4 B19001_003E 3221 non-null object
5 B19001_004E 3221 non-null object
6 B19001_005E 3221 non-null object
7 B19001_006E 3221 non-null object
8 B19001B_001E 3221 non-null object
9 B19001B_002E 3221 non-null object
10 B19001B_003E 3221 non-null object
11 B19001B_004E 3221 non-null object
12 B19001B_005E 3221 non-null object
13 B19001B_006E 3221 non-null object
14 B19001C_001E 3221 non-null object
15 B19001C_002E 3221 non-null object
16 B19001C_003E 3221 non-null object
17 B19001C_004E 3221 non-null object
18 B19001C_005E 3221 non-null object
19 B19001C_006E 3221 non-null object
20 B19001D_001E 3221 non-null object
21 B19001D_002E 3221 non-null object
22 B19001D_003E 3221 non-null object
23 B19001D_004E 3221 non-null object
24 B19001D_005E 3221 non-null object
25 B19001D_006E 3221 non-null object
26 B19001E_001E 3221 non-null object
27 B19001E_002E 3221 non-null object
28 B19001E_003E 3221 non-null object
29 B19001E_004E 3221 non-null object
30 B19001E_005E 3221 non-null object
31 state 3221 non-null object
32 county 3221 non-null object
33 B19001E_006E 3221 non-null object
34 B19001H_001E 3221 non-null object
35 B19001H_002E 3221 non-null object
36 B19001H_003E 3221 non-null object
37 B19001H_004E 3221 non-null object
38 B19001H_005E 3221 non-null object
39 B19001H_006E 3221 non-null object
40 B19001I_001E 3221 non-null object
41 B19001I_002E 3221 non-null object
42 B19001I_003E 3221 non-null object
43 B19001I_004E 3221 non-null object
44 B19001I_005E 3221 non-null object
45 B19001I_006E 3221 non-null object
46 B15003_001E 3221 non-null object
47 B15003_025E 3221 non-null object
48 B15003_024E 3221 non-null object
49 B15003_023E 3221 non-null object
50 B15003_022E 3221 non-null object
51 B15003_021E 3221 non-null object
52 B15003_020E 3221 non-null object
53 B15003_019E 3221 non-null object
54 B15003_018E 3221 non-null object
55 B15003_017E 3221 non-null object
56 B17001_001E 3221 non-null object
57 B17001_002E 3221 non-null object
58 B23025_002E 3221 non-null object
59 B23025_005E 3221 non-null object
60 B19058_002E 3221 non-null object
61 B19058_001E 3221 non-null object
62 B19056_002E 3221 non-null object
63 B19056_001E 3221 non-null object
dtypes: object(64)
memory usage: 1.6+ MB
[8]:
NAME | GEO_ID | B19001_001E | B19001_002E | B19001_003E | B19001_004E | B19001_005E | B19001_006E | B19001B_001E | B19001B_002E | ... | B15003_018E | B15003_017E | B17001_001E | B17001_002E | B23025_002E | B23025_005E | B19058_002E | B19058_001E | B19056_002E | B19056_001E | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Autauga County, Alabama | 0500000US01001 | 21856 | 1213 | 1051 | 1062 | 1324 | 1048 | 4228 | 422 | ... | 2519 | 10458 | 57790 | 7847 | 27550 | 752 | 2360 | 21856 | 1325 | 21856 |
1 | Baldwin County, Alabama | 0500000US01003 | 87190 | 4251 | 3672 | 2742 | 3340 | 3969 | 7095 | 647 | ... | 8156 | 36186 | 223772 | 20598 | 108873 | 3994 | 6215 | 87190 | 4007 | 87190 |
2 | Barbour County, Alabama | 0500000US01005 | 9088 | 1120 | 647 | 870 | 806 | 538 | 4079 | 800 | ... | 1396 | 5204 | 22250 | 5890 | 9369 | 808 | 2361 | 9088 | 963 | 9088 |
3 | Bibb County, Alabama | 0500000US01007 | 7083 | 528 | 555 | 434 | 388 | 259 | 1288 | 324 | ... | 1496 | 5556 | 21000 | 3558 | 9107 | 884 | 1272 | 7083 | 480 | 7083 |
4 | Blount County, Alabama | 0500000US01009 | 21300 | 1784 | 885 | 943 | 1082 | 1164 | 119 | 18 | ... | 3265 | 11019 | 58323 | 7720 | 25844 | 1554 | 2323 | 21300 | 1862 | 21300 |
5 rows × 64 columns
Convert Values to Scalar (Integers)#
Fortunately, we can convert the columns we are going to be using for analysis to integers quite easily using.
[9]:
df[req_vars] = df[req_vars].astype('int') # convert downloaded data to integers
df.info()
df.head()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3221 entries, 0 to 3220
Data columns (total 64 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 NAME 3221 non-null object
1 GEO_ID 3221 non-null object
2 B19001_001E 3221 non-null int32
3 B19001_002E 3221 non-null int32
4 B19001_003E 3221 non-null int32
5 B19001_004E 3221 non-null int32
6 B19001_005E 3221 non-null int32
7 B19001_006E 3221 non-null int32
8 B19001B_001E 3221 non-null int32
9 B19001B_002E 3221 non-null int32
10 B19001B_003E 3221 non-null int32
11 B19001B_004E 3221 non-null int32
12 B19001B_005E 3221 non-null int32
13 B19001B_006E 3221 non-null int32
14 B19001C_001E 3221 non-null int32
15 B19001C_002E 3221 non-null int32
16 B19001C_003E 3221 non-null int32
17 B19001C_004E 3221 non-null int32
18 B19001C_005E 3221 non-null int32
19 B19001C_006E 3221 non-null int32
20 B19001D_001E 3221 non-null int32
21 B19001D_002E 3221 non-null int32
22 B19001D_003E 3221 non-null int32
23 B19001D_004E 3221 non-null int32
24 B19001D_005E 3221 non-null int32
25 B19001D_006E 3221 non-null int32
26 B19001E_001E 3221 non-null int32
27 B19001E_002E 3221 non-null int32
28 B19001E_003E 3221 non-null int32
29 B19001E_004E 3221 non-null int32
30 B19001E_005E 3221 non-null int32
31 state 3221 non-null object
32 county 3221 non-null object
33 B19001E_006E 3221 non-null int32
34 B19001H_001E 3221 non-null int32
35 B19001H_002E 3221 non-null int32
36 B19001H_003E 3221 non-null int32
37 B19001H_004E 3221 non-null int32
38 B19001H_005E 3221 non-null int32
39 B19001H_006E 3221 non-null int32
40 B19001I_001E 3221 non-null int32
41 B19001I_002E 3221 non-null int32
42 B19001I_003E 3221 non-null int32
43 B19001I_004E 3221 non-null int32
44 B19001I_005E 3221 non-null int32
45 B19001I_006E 3221 non-null int32
46 B15003_001E 3221 non-null int32
47 B15003_025E 3221 non-null int32
48 B15003_024E 3221 non-null int32
49 B15003_023E 3221 non-null int32
50 B15003_022E 3221 non-null int32
51 B15003_021E 3221 non-null int32
52 B15003_020E 3221 non-null int32
53 B15003_019E 3221 non-null int32
54 B15003_018E 3221 non-null int32
55 B15003_017E 3221 non-null int32
56 B17001_001E 3221 non-null int32
57 B17001_002E 3221 non-null int32
58 B23025_002E 3221 non-null int32
59 B23025_005E 3221 non-null int32
60 B19058_002E 3221 non-null int32
61 B19058_001E 3221 non-null int32
62 B19056_002E 3221 non-null int32
63 B19056_001E 3221 non-null int32
dtypes: int32(60), object(4)
memory usage: 880.7+ KB
[9]:
NAME | GEO_ID | B19001_001E | B19001_002E | B19001_003E | B19001_004E | B19001_005E | B19001_006E | B19001B_001E | B19001B_002E | ... | B15003_018E | B15003_017E | B17001_001E | B17001_002E | B23025_002E | B23025_005E | B19058_002E | B19058_001E | B19056_002E | B19056_001E | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Autauga County, Alabama | 0500000US01001 | 21856 | 1213 | 1051 | 1062 | 1324 | 1048 | 4228 | 422 | ... | 2519 | 10458 | 57790 | 7847 | 27550 | 752 | 2360 | 21856 | 1325 | 21856 |
1 | Baldwin County, Alabama | 0500000US01003 | 87190 | 4251 | 3672 | 2742 | 3340 | 3969 | 7095 | 647 | ... | 8156 | 36186 | 223772 | 20598 | 108873 | 3994 | 6215 | 87190 | 4007 | 87190 |
2 | Barbour County, Alabama | 0500000US01005 | 9088 | 1120 | 647 | 870 | 806 | 538 | 4079 | 800 | ... | 1396 | 5204 | 22250 | 5890 | 9369 | 808 | 2361 | 9088 | 963 | 9088 |
3 | Bibb County, Alabama | 0500000US01007 | 7083 | 528 | 555 | 434 | 388 | 259 | 1288 | 324 | ... | 1496 | 5556 | 21000 | 3558 | 9107 | 884 | 1272 | 7083 | 480 | 7083 |
4 | Blount County, Alabama | 0500000US01009 | 21300 | 1784 | 885 | 943 | 1082 | 1164 | 119 | 18 | ... | 3265 | 11019 | 58323 | 7720 | 25844 | 1554 | 2323 | 21300 | 1862 | 21300 |
5 rows × 64 columns
Percentage Calculations#
Next, we are going to calculate a few percentages we will then use for calculating our composite index. To make life easier, we can reuse the function below to take advantage of the variable descriptions to explain which variables are being used, and also calculate the percentage columns at the same time.
[10]:
def add_percent_column(percent_column_name: str, numerator_variables: List[str], denominator_variable: str) -> pd.DataFrame:
"""
Helper function to calculate the percent columns and report the description of the variables being used.
.. note::
This function modifies the data frame in place, so the original data frame will have the newly added column.
Args:
percent_column_name: column name to be added to the data frame
numerator_variables: list of column names to be summed together for calculating the percentage
denominator_variable: column name to be used as the denominator for calculating the percentage
Returns:
Pointer to the updated data frame.
"""
# print the description for each of the numerator variables
print('Numerator Values')
for itm in label_series[numerator_variables].iteritems():
print(f'\t{itm[0]} -> {itm[1]}'.replace('!!', ' '))
# print the description for the denominator
print('Denomentar Value')
for itm in label_series[[denominator_variable]].iteritems():
print(f'\t{itm[0]} -> {itm[1]}'.replace('!!', ' '))
# calculate and add the percent column to the dataframe
df[percent_column_name] = df[numerator_variables].sum(axis=1) / df[denominator_variable]
return df
Low Income#
[11]:
df = add_percent_column(
percent_column_name='pct_less_than_30K',
numerator_variables=['B19001_002E', 'B19001_003E', 'B19001_004E', 'B19001_005E', 'B19001_006E'],
denominator_variable='B19001_001E'
)
df.head()
Numerator Values
B19001_002E -> Estimate Total: Less than $10,000 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_003E -> Estimate Total: $10,000 to $14,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_004E -> Estimate Total: $15,000 to $19,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_005E -> Estimate Total: $20,000 to $24,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
B19001_006E -> Estimate Total: $25,000 to $29,999 / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
Denomentar Value
B19001_001E -> Estimate Total: / HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS)
[11]:
NAME | GEO_ID | B19001_001E | B19001_002E | B19001_003E | B19001_004E | B19001_005E | B19001_006E | B19001B_001E | B19001B_002E | ... | B15003_017E | B17001_001E | B17001_002E | B23025_002E | B23025_005E | B19058_002E | B19058_001E | B19056_002E | B19056_001E | pct_less_than_30K | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Autauga County, Alabama | 0500000US01001 | 21856 | 1213 | 1051 | 1062 | 1324 | 1048 | 4228 | 422 | ... | 10458 | 57790 | 7847 | 27550 | 752 | 2360 | 21856 | 1325 | 21856 | 0.260706 |
1 | Baldwin County, Alabama | 0500000US01003 | 87190 | 4251 | 3672 | 2742 | 3340 | 3969 | 7095 | 647 | ... | 36186 | 223772 | 20598 | 108873 | 3994 | 6215 | 87190 | 4007 | 87190 | 0.206147 |
2 | Barbour County, Alabama | 0500000US01005 | 9088 | 1120 | 647 | 870 | 806 | 538 | 4079 | 800 | ... | 5204 | 22250 | 5890 | 9369 | 808 | 2361 | 9088 | 963 | 9088 | 0.438050 |
3 | Bibb County, Alabama | 0500000US01007 | 7083 | 528 | 555 | 434 | 388 | 259 | 1288 | 324 | ... | 5556 | 21000 | 3558 | 9107 | 884 | 1272 | 7083 | 480 | 7083 | 0.305520 |
4 | Blount County, Alabama | 0500000US01009 | 21300 | 1784 | 885 | 943 | 1082 | 1164 | 119 | 18 | ... | 11019 | 58323 | 7720 | 25844 | 1554 | 2323 | 21300 | 1862 | 21300 | 0.275023 |
5 rows × 65 columns
Percent with cash public assistance or Food Stamps/SNAP#
[12]:
df = add_percent_column(
percent_column_name='pct_with_SNAP',
numerator_variables=['B19058_002E'],
denominator_variable='B19058_001E'
)
df.head()
Numerator Values
B19058_002E -> Estimate Total: With cash public assistance or Food Stamps/SNAP / PUBLIC ASSISTANCE INCOME OR FOOD STAMPS/SNAP IN THE PAST 12 MONTHS FOR HOUSEHOLDS
Denomentar Value
B19058_001E -> Estimate Total: / PUBLIC ASSISTANCE INCOME OR FOOD STAMPS/SNAP IN THE PAST 12 MONTHS FOR HOUSEHOLDS
[12]:
NAME | GEO_ID | B19001_001E | B19001_002E | B19001_003E | B19001_004E | B19001_005E | B19001_006E | B19001B_001E | B19001B_002E | ... | B17001_001E | B17001_002E | B23025_002E | B23025_005E | B19058_002E | B19058_001E | B19056_002E | B19056_001E | pct_less_than_30K | pct_with_SNAP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Autauga County, Alabama | 0500000US01001 | 21856 | 1213 | 1051 | 1062 | 1324 | 1048 | 4228 | 422 | ... | 57790 | 7847 | 27550 | 752 | 2360 | 21856 | 1325 | 21856 | 0.260706 | 0.107980 |
1 | Baldwin County, Alabama | 0500000US01003 | 87190 | 4251 | 3672 | 2742 | 3340 | 3969 | 7095 | 647 | ... | 223772 | 20598 | 108873 | 3994 | 6215 | 87190 | 4007 | 87190 | 0.206147 | 0.071281 |
2 | Barbour County, Alabama | 0500000US01005 | 9088 | 1120 | 647 | 870 | 806 | 538 | 4079 | 800 | ... | 22250 | 5890 | 9369 | 808 | 2361 | 9088 | 963 | 9088 | 0.438050 | 0.259793 |
3 | Bibb County, Alabama | 0500000US01007 | 7083 | 528 | 555 | 434 | 388 | 259 | 1288 | 324 | ... | 21000 | 3558 | 9107 | 884 | 1272 | 7083 | 480 | 7083 | 0.305520 | 0.179585 |
4 | Blount County, Alabama | 0500000US01009 | 21300 | 1784 | 885 | 943 | 1082 | 1164 | 119 | 18 | ... | 58323 | 7720 | 25844 | 1554 | 2323 | 21300 | 1862 | 21300 | 0.275023 | 0.109061 |
5 rows × 66 columns
Percent with less than High School Degree#
[13]:
df = add_percent_column(
percent_column_name='pct_less_than_high_school',
numerator_variables=['B15003_025E', 'B15003_024E', 'B15003_023E', 'B15003_022E', 'B15003_021E', 'B15003_020E', 'B15003_019E', 'B15003_018E', 'B15003_017E'],
denominator_variable='B15003_001E'
)
df.head()
Numerator Values
B15003_025E -> Estimate Total: Doctorate degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_024E -> Estimate Total: Professional school degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_023E -> Estimate Total: Master's degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_022E -> Estimate Total: Bachelor's degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_021E -> Estimate Total: Associate's degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_020E -> Estimate Total: Some college, 1 or more years, no degree / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_019E -> Estimate Total: Some college, less than 1 year / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_018E -> Estimate Total: GED or alternative credential / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
B15003_017E -> Estimate Total: Regular high school diploma / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
Denomentar Value
B15003_001E -> Estimate Total: / EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER
[13]:
NAME | GEO_ID | B19001_001E | B19001_002E | B19001_003E | B19001_004E | B19001_005E | B19001_006E | B19001B_001E | B19001B_002E | ... | B17001_002E | B23025_002E | B23025_005E | B19058_002E | B19058_001E | B19056_002E | B19056_001E | pct_less_than_30K | pct_with_SNAP | pct_less_than_high_school | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Autauga County, Alabama | 0500000US01001 | 21856 | 1213 | 1051 | 1062 | 1324 | 1048 | 4228 | 422 | ... | 7847 | 27550 | 752 | 2360 | 21856 | 1325 | 21856 | 0.260706 | 0.107980 | 0.895845 |
1 | Baldwin County, Alabama | 0500000US01003 | 87190 | 4251 | 3672 | 2742 | 3340 | 3969 | 7095 | 647 | ... | 20598 | 108873 | 3994 | 6215 | 87190 | 4007 | 87190 | 0.206147 | 0.071281 | 0.910142 |
2 | Barbour County, Alabama | 0500000US01005 | 9088 | 1120 | 647 | 870 | 806 | 538 | 4079 | 800 | ... | 5890 | 9369 | 808 | 2361 | 9088 | 963 | 9088 | 0.438050 | 0.259793 | 0.756710 |
3 | Bibb County, Alabama | 0500000US01007 | 7083 | 528 | 555 | 434 | 388 | 259 | 1288 | 324 | ... | 3558 | 9107 | 884 | 1272 | 7083 | 480 | 7083 | 0.305520 | 0.179585 | 0.805381 |
4 | Blount County, Alabama | 0500000US01009 | 21300 | 1784 | 885 | 943 | 1082 | 1164 | 119 | 18 | ... | 7720 | 25844 | 1554 | 2323 | 21300 | 1862 | 21300 | 0.275023 | 0.109061 | 0.836481 |
5 rows × 67 columns
Percent below poverty#
[14]:
df = add_percent_column(
percent_column_name='pct_below_poverty',
numerator_variables=['B17001_002E'],
denominator_variable='B17001_001E'
)
df.head()
Numerator Values
B17001_002E -> Estimate Total: Income in the past 12 months below poverty level: / POVERTY STATUS IN THE PAST 12 MONTHS BY SEX BY AGE
Denomentar Value
B17001_001E -> Estimate Total: / POVERTY STATUS IN THE PAST 12 MONTHS BY SEX BY AGE
[14]:
NAME | GEO_ID | B19001_001E | B19001_002E | B19001_003E | B19001_004E | B19001_005E | B19001_006E | B19001B_001E | B19001B_002E | ... | B23025_002E | B23025_005E | B19058_002E | B19058_001E | B19056_002E | B19056_001E | pct_less_than_30K | pct_with_SNAP | pct_less_than_high_school | pct_below_poverty | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Autauga County, Alabama | 0500000US01001 | 21856 | 1213 | 1051 | 1062 | 1324 | 1048 | 4228 | 422 | ... | 27550 | 752 | 2360 | 21856 | 1325 | 21856 | 0.260706 | 0.107980 | 0.895845 | 0.135785 |
1 | Baldwin County, Alabama | 0500000US01003 | 87190 | 4251 | 3672 | 2742 | 3340 | 3969 | 7095 | 647 | ... | 108873 | 3994 | 6215 | 87190 | 4007 | 87190 | 0.206147 | 0.071281 | 0.910142 | 0.092049 |
2 | Barbour County, Alabama | 0500000US01005 | 9088 | 1120 | 647 | 870 | 806 | 538 | 4079 | 800 | ... | 9369 | 808 | 2361 | 9088 | 963 | 9088 | 0.438050 | 0.259793 | 0.756710 | 0.264719 |
3 | Bibb County, Alabama | 0500000US01007 | 7083 | 528 | 555 | 434 | 388 | 259 | 1288 | 324 | ... | 9107 | 884 | 1272 | 7083 | 480 | 7083 | 0.305520 | 0.179585 | 0.805381 | 0.169429 |
4 | Blount County, Alabama | 0500000US01009 | 21300 | 1784 | 885 | 943 | 1082 | 1164 | 119 | 18 | ... | 25844 | 1554 | 2323 | 21300 | 1862 | 21300 | 0.275023 | 0.109061 | 0.836481 | 0.132366 |
5 rows × 68 columns
Percent with Supplemental Security Income (SSI)#
[15]:
df = add_percent_column(
percent_column_name='pct_with_SSI',
numerator_variables=['B19056_002E'],
denominator_variable='B19056_001E'
)
df.head()
Numerator Values
B19056_002E -> Estimate Total: With Supplemental Security Income (SSI) / SUPPLEMENTAL SECURITY INCOME (SSI) IN THE PAST 12 MONTHS FOR HOUSEHOLDS
Denomentar Value
B19056_001E -> Estimate Total: / SUPPLEMENTAL SECURITY INCOME (SSI) IN THE PAST 12 MONTHS FOR HOUSEHOLDS
[15]:
NAME | GEO_ID | B19001_001E | B19001_002E | B19001_003E | B19001_004E | B19001_005E | B19001_006E | B19001B_001E | B19001B_002E | ... | B23025_005E | B19058_002E | B19058_001E | B19056_002E | B19056_001E | pct_less_than_30K | pct_with_SNAP | pct_less_than_high_school | pct_below_poverty | pct_with_SSI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Autauga County, Alabama | 0500000US01001 | 21856 | 1213 | 1051 | 1062 | 1324 | 1048 | 4228 | 422 | ... | 752 | 2360 | 21856 | 1325 | 21856 | 0.260706 | 0.107980 | 0.895845 | 0.135785 | 0.060624 |
1 | Baldwin County, Alabama | 0500000US01003 | 87190 | 4251 | 3672 | 2742 | 3340 | 3969 | 7095 | 647 | ... | 3994 | 6215 | 87190 | 4007 | 87190 | 0.206147 | 0.071281 | 0.910142 | 0.092049 | 0.045957 |
2 | Barbour County, Alabama | 0500000US01005 | 9088 | 1120 | 647 | 870 | 806 | 538 | 4079 | 800 | ... | 808 | 2361 | 9088 | 963 | 9088 | 0.438050 | 0.259793 | 0.756710 | 0.264719 | 0.105964 |
3 | Bibb County, Alabama | 0500000US01007 | 7083 | 528 | 555 | 434 | 388 | 259 | 1288 | 324 | ... | 884 | 1272 | 7083 | 480 | 7083 | 0.305520 | 0.179585 | 0.805381 | 0.169429 | 0.067768 |
4 | Blount County, Alabama | 0500000US01009 | 21300 | 1784 | 885 | 943 | 1082 | 1164 | 119 | 18 | ... | 1554 | 2323 | 21300 | 1862 | 21300 | 0.275023 | 0.109061 | 0.836481 | 0.132366 | 0.087418 |
5 rows × 69 columns
Percent Unemployement#
[16]:
df = add_percent_column(
percent_column_name='pct_unemployment',
numerator_variables=['B23025_002E'],
denominator_variable='B23025_005E'
)
df.head()
Numerator Values
B23025_002E -> Estimate Total: In labor force: / EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER
Denomentar Value
B23025_005E -> Estimate Total: In labor force: Civilian labor force: Unemployed / EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER
[16]:
NAME | GEO_ID | B19001_001E | B19001_002E | B19001_003E | B19001_004E | B19001_005E | B19001_006E | B19001B_001E | B19001B_002E | ... | B19058_002E | B19058_001E | B19056_002E | B19056_001E | pct_less_than_30K | pct_with_SNAP | pct_less_than_high_school | pct_below_poverty | pct_with_SSI | pct_unemployment | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Autauga County, Alabama | 0500000US01001 | 21856 | 1213 | 1051 | 1062 | 1324 | 1048 | 4228 | 422 | ... | 2360 | 21856 | 1325 | 21856 | 0.260706 | 0.107980 | 0.895845 | 0.135785 | 0.060624 | 36.635638 |
1 | Baldwin County, Alabama | 0500000US01003 | 87190 | 4251 | 3672 | 2742 | 3340 | 3969 | 7095 | 647 | ... | 6215 | 87190 | 4007 | 87190 | 0.206147 | 0.071281 | 0.910142 | 0.092049 | 0.045957 | 27.259139 |
2 | Barbour County, Alabama | 0500000US01005 | 9088 | 1120 | 647 | 870 | 806 | 538 | 4079 | 800 | ... | 2361 | 9088 | 963 | 9088 | 0.438050 | 0.259793 | 0.756710 | 0.264719 | 0.105964 | 11.595297 |
3 | Bibb County, Alabama | 0500000US01007 | 7083 | 528 | 555 | 434 | 388 | 259 | 1288 | 324 | ... | 1272 | 7083 | 480 | 7083 | 0.305520 | 0.179585 | 0.805381 | 0.169429 | 0.067768 | 10.302036 |
4 | Blount County, Alabama | 0500000US01009 | 21300 | 1784 | 885 | 943 | 1082 | 1164 | 119 | 18 | ... | 2323 | 21300 | 1862 | 21300 | 0.275023 | 0.109061 | 0.836481 | 0.132366 | 0.087418 | 16.630631 |
5 rows × 70 columns
Data Cleanup#
Next, if the denominator value is zero, the output will be infinity, which is not very useful. For our later steps to work accurately, we replace these with None
.
[17]:
pct_cols = ['pct_less_than_30K', 'pct_less_than_high_school', 'pct_below_poverty', 'pct_with_SNAP', 'pct_with_SSI', 'pct_unemployment']
for pct_col in pct_cols:
df.loc[np.isinf(df[pct_col]), pct_col] = None
Now, with our summary columns calculated, we are pruning the schema, removing columns we not longer want in the table by dropping all the columns we retrieved from the Census API. If you still want these, you can simply skip or remove this cell.
[18]:
df = df.drop(columns=req_vars)
df['FIPS'] = df['state'] + df['county']
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3221 entries, 0 to 3220
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 NAME 3221 non-null object
1 GEO_ID 3221 non-null object
2 state 3221 non-null object
3 county 3221 non-null object
4 pct_less_than_30K 3221 non-null float64
5 pct_with_SNAP 3221 non-null float64
6 pct_less_than_high_school 3221 non-null float64
7 pct_below_poverty 3221 non-null float64
8 pct_with_SSI 3221 non-null float64
9 pct_unemployment 3198 non-null float64
10 FIPS 3221 non-null object
dtypes: float64(6), object(5)
memory usage: 302.0+ KB
[19]:
df.head()
[19]:
NAME | GEO_ID | state | county | pct_less_than_30K | pct_with_SNAP | pct_less_than_high_school | pct_below_poverty | pct_with_SSI | pct_unemployment | FIPS | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Autauga County, Alabama | 0500000US01001 | 01 | 001 | 0.260706 | 0.107980 | 0.895845 | 0.135785 | 0.060624 | 36.635638 | 01001 |
1 | Baldwin County, Alabama | 0500000US01003 | 01 | 003 | 0.206147 | 0.071281 | 0.910142 | 0.092049 | 0.045957 | 27.259139 | 01003 |
2 | Barbour County, Alabama | 0500000US01005 | 01 | 005 | 0.438050 | 0.259793 | 0.756710 | 0.264719 | 0.105964 | 11.595297 | 01005 |
3 | Bibb County, Alabama | 0500000US01007 | 01 | 007 | 0.305520 | 0.179585 | 0.805381 | 0.169429 | 0.067768 | 10.302036 | 01007 |
4 | Blount County, Alabama | 0500000US01009 | 01 | 009 | 0.275023 | 0.109061 | 0.836481 | 0.132366 | 0.087418 | 16.630631 | 01009 |
Calculate Composite Index#
With the percentage columns added, now we can prepare the data and calcualte our composite index. We are doing this here using Python and Pandas, but you can accomplish this same workflow in ArcGIS Pro using the Calculate Composite Index Geoprocessing tool.
Ensure Factors are Equally Scaled#
The percentage columns calculated earlier, their respective ranges vary. To calculate our composite index we are going to need to rescale these columns.
[20]:
df.describe().loc[['min', 'max']]
[20]:
pct_less_than_30K | pct_with_SNAP | pct_less_than_high_school | pct_below_poverty | pct_with_SSI | pct_unemployment | |
---|---|---|---|---|---|---|
min | 0.030303 | 0.000000 | 0.184466 | 0.012048 | 0.000000 | 2.934219 |
max | 0.832910 | 0.630624 | 0.993976 | 0.670737 | 0.352227 | 1444.000000 |
Min-Max Scaler#
In this case, we are keeping it simple by using the MinMaxScaler from SciKit-Learn. Depending on the characteristics of the data you are using, there are a large variety of other ways to scale data.
[21]:
# identify the quantitative columns to scale, and also those to add back on once done with rescaling
quant_cols = ['pct_less_than_30K', 'pct_less_than_high_school', 'pct_below_poverty', 'pct_with_SNAP', 'pct_with_SSI', 'pct_unemployment']
qual_cols = [c for c in df.columns if c not in quant_cols]
# create an instance of the min-max scaler
min_max_trs = MinMaxScaler(feature_range=(0, 1))
# transform the data - since it returns a numpy array, convert back into data frame
df_scaled = pd.DataFrame(min_max_trs.fit_transform(df[quant_cols]), columns=quant_cols)
# combine the non-scalar columns back onto the data frame again
df_scaled = pd.merge(df[qual_cols], df_scaled, left_index=True, right_index=True)
# report minimum and maximums for comparison
df_scaled.describe().loc[['min', 'max']]
[21]:
pct_less_than_30K | pct_less_than_high_school | pct_below_poverty | pct_with_SNAP | pct_with_SSI | pct_unemployment | |
---|---|---|---|---|---|---|
min | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
max | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
Apply Weighting Coefficients#
Although we are not changing anyting with this step, it is common practice to apply weighting coefficients to the variables. These coefficients must be between zero and one.
[22]:
# dictionary of coefficients
coefficient_dict = {
'pct_less_than_30K': 1.0,
'pct_less_than_high_school': 1.0,
'pct_below_poverty': 1.0,
'pct_unemployment': 1.0,
'pct_with_SNAP': 1.0,
'pct_with_SSI': 1.0,
}
# apply weighting coefficients
for column_name, coefficient in coefficient_dict.items():
df_scaled[column_name] = df_scaled[column_name] * coefficient
Create Composite Index#
Now we can calculate the composite index by averaging the scaled and weighted percentage columns.
[23]:
df['composite_index'] = df_scaled[quant_cols].sum(axis=1) / len(quant_cols)
df.describe()
[23]:
pct_less_than_30K | pct_with_SNAP | pct_less_than_high_school | pct_below_poverty | pct_with_SSI | pct_unemployment | composite_index | |
---|---|---|---|---|---|---|---|
count | 3221.000000 | 3221.000000 | 3221.000000 | 3221.000000 | 3221.000000 | 3198.000000 | 3221.000000 |
mean | 0.273047 | 0.139425 | 0.876526 | 0.151739 | 0.057440 | 26.068958 | 0.294923 |
std | 0.103110 | 0.084510 | 0.062635 | 0.078536 | 0.030114 | 40.033126 | 0.058435 |
min | 0.030303 | 0.000000 | 0.184466 | 0.012048 | 0.000000 | 2.934219 | 0.086000 |
25% | 0.207489 | 0.087415 | 0.843112 | 0.102226 | 0.038542 | 15.420289 | 0.256916 |
50% | 0.256952 | 0.123949 | 0.890754 | 0.135895 | 0.053060 | 20.301231 | 0.285220 |
75% | 0.320227 | 0.166724 | 0.922745 | 0.180265 | 0.070845 | 27.302729 | 0.318349 |
max | 0.832910 | 0.630624 | 0.993976 | 0.670737 | 0.352227 | 1444.000000 | 0.596631 |
Add Standard Deviation and Quartiles for Composite Index#
In addition to the basic descriptive statistics displayed above, we also can add a column for the standard deviation of our newly added composite index to better understand our new index.
[31]:
col_nm = 'composite_index'
# create name for the columns to be added for standard deviation and quartile columns
std_col = f'{col_nm}_std'
quartile_col = f'{col_nm}_quartile'
# use describe to get descriptive statistics to use for the column
desc = df[col_nm].describe()
# assign the standard deviation for each of the literal values
df[std_col] = (df[col_nm] - desc['mean']) / desc['std']
# assign the relevant quartile to each of the columns
df.loc[df[col_nm] <= desc['25%'], quartile_col] = 1
df.loc[(df[col_nm] > desc['25%']) & (df[col_nm] <= desc['50%']), quartile_col] = 2
df.loc[(df[col_nm] > desc['50%']) & (df[col_nm] <= desc['75%']), quartile_col] = 3
df.loc[df[col_nm] > desc['75%'], quartile_col] = 4
# cast to integer
df[quartile_col] = df[quartile_col].astype('int64')
df[['NAME', 'GEO_ID', col_nm, std_col, quartile_col]].sample(5)
[31]:
NAME | GEO_ID | composite_index | composite_index_std | composite_index_quartile | |
---|---|---|---|---|---|
1856 | Monroe County, New York | 0500000US36055 | 0.295282 | 0.006144 | 3 |
2719 | Refugio County, Texas | 0500000US48391 | 0.302491 | 0.129521 | 3 |
2229 | Lincoln County, Oregon | 0500000US41041 | 0.302976 | 0.137808 | 3 |
2560 | Cherokee County, Texas | 0500000US48073 | 0.279280 | -0.267690 | 2 |
2367 | Brookings County, South Dakota | 0500000US46011 | 0.243141 | -0.886144 | 1 |
Making the Data Geographic - Adding Geometry#
To view the data on a map, we need to get geometries, polygons, of the counties, add them to the data created above, and publish this data so it can be viewed.
Connect Using a GIS
Object Instance#
In this case, I am using a GIS
profile. However, there are a multitude of methods for authenticaing with ArcGIS Online.
[27]:
gis = GIS(profile='ba')
gis
[27]:
Access the County Boundaries Item
#
Items reference resources you can use through ArcGIS Online. In this case, we are taking advantage of an administrative boundaries dataset made available through ArcGIS Living Atlas.
[28]:
itm = gis.content.get('9859918e235f47ce90deff19bed75010')
itm
[28]:
View Item Layers#
This Item references variety of Feature Layers we can use. In our case, we need the Counties, the layer referenced by the 2
index.
[29]:
for lyr in itm.layers:
print(lyr.properties.name, lyr.properties.id)
USA_Country 0
USA_State 1
USA_County 2
USA_ZipCode 3
USA_Tract 4
USA_BlockGroup 5
Create a Layer Object Instance#
Using the correct index, we can retrieve the Counties as a FeatureLayer, the object enabling access to the underlying data.
[30]:
lyr = itm.layers[2]
lyr
[30]:
<FeatureLayer url:"https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Boundaries_2021/FeatureServer/2">
Retrieve Data Using a Query#
Through the layer, we can query to retrieve the data as a FeatureSet. Using function chaining, we can immediately convert this to a Spatially Enabled DataFrame using the sdf
property of
the FeatureSet
.
[31]:
lyr_df = lyr.query(out_fields=['FIPS'], out_sr=4326).sdf
lyr_df.info()
lyr_df.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3143 entries, 0 to 3142
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 OBJECTID 3143 non-null Int64
1 FIPS 3143 non-null string
2 SHAPE 3143 non-null geometry
dtypes: Int64(1), geometry(1), string(1)
memory usage: 76.9 KB
[31]:
OBJECTID | FIPS | SHAPE | |
---|---|---|---|
0 | 1 | 01001 | {"rings": [[[-86.413120727, 32.7073921370001],... |
1 | 2 | 01003 | {"rings": [[[-87.5649079999999, 30.281622], [-... |
2 | 3 | 01005 | {"rings": [[[-85.257838372, 32.147937056], [-8... |
3 | 4 | 01007 | {"rings": [[[-87.06574294, 33.2469132270001], ... |
4 | 5 | 01009 | {"rings": [[[-86.453024823, 34.259323463], [-8... |
Combine Geometry and Custom Data#
Using a join, we can now combine our custom data with the percent columns and the compsite index describing economic distress with the geometries retrieved above to create a new Spatially Enabled DataFrame.
[32]:
df_geo = df.join(lyr_df[['FIPS', 'SHAPE']].set_index('FIPS'), on='FIPS', how='right')
df_geo.spatial.set_geometry('SHAPE')
df_geo.info()
df_geo.head()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3143 entries, 0 to 3142
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 NAME 3143 non-null object
1 GEO_ID 3143 non-null object
2 state 3143 non-null object
3 county 3143 non-null object
4 pct_less_than_30K 3143 non-null float64
5 pct_with_SNAP 3143 non-null float64
6 pct_less_than_high_school 3143 non-null float64
7 pct_below_poverty 3143 non-null float64
8 pct_with_SSI 3143 non-null float64
9 pct_unemployment 3120 non-null float64
10 FIPS 3143 non-null object
11 composite_index 3143 non-null float64
12 composite_index_std 3143 non-null float64
13 composite_index_quartile 3143 non-null int64
14 SHAPE 3143 non-null geometry
dtypes: float64(8), geometry(1), int64(1), object(5)
memory usage: 392.9+ KB
[32]:
NAME | GEO_ID | state | county | pct_less_than_30K | pct_with_SNAP | pct_less_than_high_school | pct_below_poverty | pct_with_SSI | pct_unemployment | FIPS | composite_index | composite_index_std | composite_index_quartile | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Autauga County, Alabama | 0500000US01001 | 01 | 001 | 0.260706 | 0.107980 | 0.895845 | 0.135785 | 0.060624 | 36.635638 | 01001 | 0.286738 | -0.140066 | 3 | {"rings": [[[-86.413120727, 32.7073921370001],... |
1 | Baldwin County, Alabama | 0500000US01003 | 01 | 003 | 0.206147 | 0.071281 | 0.910142 | 0.092049 | 0.045957 | 27.259139 | 01003 | 0.249562 | -0.776255 | 1 | {"rings": [[[-87.5649079999999, 30.281622], [-... |
2 | Barbour County, Alabama | 0500000US01005 | 01 | 005 | 0.438050 | 0.259793 | 0.756710 | 0.264719 | 0.105964 | 11.595297 | 01005 | 0.386223 | 1.562417 | 4 | {"rings": [[[-85.257838372, 32.147937056], [-8... |
3 | Bibb County, Alabama | 0500000US01007 | 01 | 007 | 0.305520 | 0.179585 | 0.805381 | 0.169429 | 0.067768 | 10.302036 | 01007 | 0.305191 | 0.175712 | 3 | {"rings": [[[-87.06574294, 33.2469132270001], ... |
4 | Blount County, Alabama | 0500000US01009 | 01 | 009 | 0.275023 | 0.109061 | 0.836481 | 0.132366 | 0.087418 | 16.630631 | 01009 | 0.287274 | -0.130889 | 3 | {"rings": [[[-86.453024823, 34.259323463], [-8... |