Metabase
Important Capabilities
Capability | Status | Notes |
---|---|---|
Platform Instance | ✅ | Enabled by default |
This plugin extracts Charts, dashboards, and associated metadata. This plugin is in beta and has only been tested on PostgreSQL and H2 database.
Dashboard
/api/dashboard endpoint is used to retrieve the following dashboard information.
- Title and description
- Last edited by
- Owner
- Link to the dashboard in Metabase
- Associated charts
Chart
/api/card endpoint is used to retrieve the following information.
- Title and description
- Last edited by
- Owner
- Link to the chart in Metabase
- Datasource and lineage
The following properties for a chart are ingested in DataHub.
Name | Description |
---|---|
Dimensions | Column names |
Filters | Any filters applied to the chart |
Metrics | All columns that are being used for aggregation |
CLI based Ingestion
Install the Plugin
pip install 'acryl-datahub[metabase]'
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
connect_uri string | Metabase host URL. Default: localhost:3000 |
database_alias_map object | Database name map to use when constructing dataset URN. |
default_schema string | Default schema name to use when schema is not provided in an SQL query Default: public |
engine_platform_map map(str,string) | |
password string(password) | Metabase password. |
platform_instance_map map(str,string) | |
username string | Metabase username. |
env string | The environment that all assets produced by this connector belong to Default: PROD |
The JSONSchema for this configuration is inlined below.
{
"title": "MetabaseConfig",
"description": "Any non-Dataset source that produces lineage to Datasets should inherit this class.\ne.g. Orchestrators, Pipelines, BI Tools etc.",
"type": "object",
"properties": {
"env": {
"title": "Env",
"description": "The environment that all assets produced by this connector belong to",
"default": "PROD",
"type": "string"
},
"platform_instance_map": {
"title": "Platform Instance Map",
"description": "A holder for platform -> platform_instance mappings to generate correct dataset urns",
"type": "object",
"additionalProperties": {
"type": "string"
}
},
"connect_uri": {
"title": "Connect Uri",
"description": "Metabase host URL.",
"default": "localhost:3000",
"type": "string"
},
"username": {
"title": "Username",
"description": "Metabase username.",
"type": "string"
},
"password": {
"title": "Password",
"description": "Metabase password.",
"type": "string",
"writeOnly": true,
"format": "password"
},
"database_alias_map": {
"title": "Database Alias Map",
"description": "Database name map to use when constructing dataset URN.",
"type": "object"
},
"engine_platform_map": {
"title": "Engine Platform Map",
"description": "Custom mappings between metabase database engines and DataHub platforms",
"type": "object",
"additionalProperties": {
"type": "string"
}
},
"default_schema": {
"title": "Default Schema",
"description": "Default schema name to use when schema is not provided in an SQL query",
"default": "public",
"type": "string"
}
},
"additionalProperties": false
}
Metabase databases will be mapped to a DataHub platform based on the engine listed in the
api/database response. This mapping can be
customized by using the engine_platform_map
config option. For example, to map databases using the athena
engine to
the underlying datasets in the glue
platform, the following snippet can be used:
engine_platform_map:
athena: glue
DataHub will try to determine database name from Metabase api/database
payload. However, the name can be overridden from database_alias_map
for a given database connected to Metabase.
Compatibility
Metabase version v0.41.2
Code Coordinates
- Class Name:
datahub.ingestion.source.metabase.MetabaseSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Metabase, feel free to ping us on our Slack.