Preprocessing

With a massive amount of data available through Business Analyst, it is fairly straightforward to integrate this into data pipelines for subsequent analysis. These examples demonstrate how enrich can be integrated into a SciKit-Learn Transformer as a preprocessor.

class ba_examples.preprocessing.ArrayToDataFrame(columns_template, index=None)

Bases: BaseEstimator, TransformerMixin

Helper to convert the output np.ndarray back into a Pandas DataFrame.

Parameters:
  • columns_template (Union[DataFrame, List[str]]) – Template of columns to use when creating the data frame.

  • index (Optional[list]) – Index to add on data frame.

fit(X)

Fit method, which just sets properties. :type X: ndarray :param X: np.ndarray to be converted into a Pandas Data Frame.

transform(X)

Convert the np.ndarray into a Pandas DataFrame.

Parameters:

X (ndarray) – np.ndarray to be converted into a Pandas Data Frame.

Returns:

Data from the nd.ndarray in the columns from the Pandas Data Frame.

class ba_examples.preprocessing.EnrichBase

Bases: BaseEstimator, TransformerMixin

The arcpy.geoenrichment.Country.enrich method provides access to a massive amount of data for analysis, a treasure trove of valuable data you can use through enrichment. This object streamlines the process of accessing this method as part of a SciKit-Learn Pipeline by wrapping the functionality into a Transformer, specifically a preprocessor, and is used to create other transformers performing more specific tasks.

property country

arcgis.geoenrichment.Country object instance being used.

property enrich_var_aliases

List of enrich aliases, so you can understand what the variables are.

property enrich_variables

Pandas data frame of variables being used for enrichment.

fit(X)

Since just building a preprocessor nothing is happening here.

property return_geometry

Do you want the geometry when enriching?

class ba_examples.preprocessing.EnrichPolygon(country, enrich_variables, return_geometry=True)

Bases: EnrichBase

The arcpy.geoenrichment.Country.enrich wrapped in a preprocessor for enriching input areas delineated with arcgis.geometry.Polygon geometries. Inherits from EnrichBase.

Parameters:
  • country (Country) – Country to be used for enrichment.

  • enrich_variables (Union[List[str], DataFrame]) – A list of enrich variable names or filtered dataframe of enrich variables to be used.

  • return_geometry (bool) – Do you want the shapes or not?

transform(X)

Retrieve Pandas Data Frame of enriched data.

Parameters:

X (Union[DataFrame, List[Polygon], ndarray]) – List of Polygon geometries or Spatially Enabled DataFrame of areas to be enriched.

Return type:

DataFrame

Returns:

Enriched data.

class ba_examples.preprocessing.EnrichStandardGeography(country, enrich_variables, standard_geography_level=<class 'str'>, return_geometry=True)

Bases: EnrichBase

The arcpy.geoenrichment.Country.enrich wrapped in a preprocessor for enriching a list of standard geographies identified by their unique identifiers. A common example is postal or ZIP codes.

Parameters:
  • country (Country) – Country to be used for enrichment.

  • enrich_variables (Union[List[str], DataFrame]) – A list of enrich variable names or filtered dataframe of enrich variables to be used.

  • standard_geography_level – Standard geography level to use for enrichment.

  • return_geometry (bool) – Do you want the shapes or not?

transform(X)

Retrieve Pandas Data Frame of enriched data.

Parameters:

X (Union[DataFrame, List[Polygon], ndarray]) – List of standard geography unique identifiers.

Return type:

DataFrame

Returns:

Enriched data.

class ba_examples.preprocessing.KeepOnlyEnrichColumns(country, id_column=None, keep_geometry=True)

Bases: BaseEstimator, TransformerMixin

Remove any non-enrich variable columns from a Pandas data frame.

Parameters:
  • country (Country) – arcgis.geoenrichment.Country object used for original enrichment.

  • id_column (Optional[str]) – Column with unique identifiers. This will become the output index. If no column specified, the existing index will be used.

  • keep_geometry (bool) – Whether to keep the geometry, if applicable.

fit(X)

Sets properties based on the input parameters and data.

Parameters:

X (DataFrame) – Pandas data frame created from the arcgis.geoenrichment.Country.enrich method.

Returns:

Pandas DataFrame pruned to just retain columns from enrichment.

transform(X)
Parameters:

X (DataFrame) – Pandas data frame output from arcgis.geoenrichment.Country.enrich method.

Returns:

Pandas data frame with only enrich columns, the identifier column as the index, and the geometry column, if applicable.