Metadata
The metadata catalog helps you understand, organize, and discover all your data assets. Know what data you have, where it is, and how to use it.
What is Metadata?
Metadata is information about data:
- What it is (table name, description)
- Where it lives (database, collection)
- Who owns it (team, person)
- How to use it (schema, examples)
- When it updates (frequency, last update)
- Why it exists (business purpose)
Browsing the Catalog
Search and Discover
- Click Metadata in sidebar
- Search by name, owner, or tag
- Browse categories
- Click any asset for details
Asset Information
View key details about any data asset:
Basic Info
- Name and description
- Type (table, collection, view)
- Owner and steward
- Created and updated dates
Schema
- Column/field names and types
- Required vs optional
- Sample values
- Data quality metrics
Lineage
- Where data comes from
- How it's transformed
- Where it goes
- Dependencies
Usage
- Who uses it
- Which reports/dashboards use it
- Query frequency
- Last access date
Organizing Data Assets
Descriptions
Add meaningful descriptions:
Asset: orders
Description: "All customer orders from our e-commerce platform.
Updated daily at 2 AM UTC. Contains order details, amounts,
and customer IDs. Used by Finance and Sales teams."
Tags and Categories
Tag assets for easy discovery:
- Business domain: Sales, Finance, Operations
- Data quality: Verified, Raw, Experimental
- Access level: Public, Restricted, Internal
- Update frequency: Real-time, Daily, Weekly
Ownership
Assign ownership:
- Owner - Primary contact
- Steward - Data quality responsible
- Domain expert - Knows the business context
Schema Management
Column Documentation
Document each field:
Column: customer_id
Type: Integer (64-bit)
Nullable: No
Format: Unique identifier
Example: 12345
Business meaning: Uniquely identifies a customer
Used in: Orders, Payments, Returns
Data Types and Constraints
- String: Max 255 characters
- Integer: Range -2,147,483,648 to 2,147,483,647
- Timestamp: ISO 8601 format
- Boolean: true/false values
- Array: List of values
- Object: Nested structure
Sample Data
View examples of actual data:
customer_id | name | email
------------|-------------------|------------------
12345 | Alice Smith | alice@example.com
12346 | Bob Johnson | bob@example.com
12347 | Carol White | carol@example.com
Data Quality Metrics
Built-in Metrics
Track data quality automatically:
- Completeness - % of non-null values (target: >99%)
- Uniqueness - % of unique values
- Timeliness - How current is the data
- Accuracy - Matches expected ranges and formats
- Consistency - Matches across systems
Quality Rules
Define what "good" looks like:
Rule: customer_email must be unique
Rule: order_date cannot be in future
Rule: customer_id must reference valid customer
Rule: price must be > 0
Quality Reports
See quality trends:
Last 30 days:
Completeness: 99.8% → 99.9% ↑
Uniqueness: 99.99% (steady)
Accuracy: 99.5% → 99.7% ↑
Alerts: None
Data Governance
Access Control
Define who can use each asset:
- Public - Anyone in organization
- Restricted - Specific teams only
- Confidential - Limited access
- Sensitive - Requires approval
Retention Policy
How long to keep data:
Historical orders: Keep 7 years (compliance)
Customer emails: Keep 2 years (GDPR)
Session logs: Keep 90 days (storage optimization)
Error logs: Keep 30 days (debugging)
Classification
Classify data by sensitivity:
PII (Personally Identifiable Information):
- Names, emails, phone numbers, addresses
- Requires: Encryption, limited access, audit logging
Financial:
- Bank accounts, credit cards, invoices
- Requires: Encryption, approval workflows, compliance
Public:
- Product names, descriptions, prices
- Available to all
Data Contracts
Define Expectations
Specify what consuming teams can expect:
Data Asset: revenue_daily
Provider: Finance team
Update frequency: Daily at 5 AM UTC
Schema: revenue (decimal), date (timestamp), region (string)
Quality SLA: 99.5% completeness
Support: finance-data@company.com
Breaking Changes
Plan for data changes:
⚠️ Breaking Change Alert
Asset: customer_phone
Action: Removing this field on 2024-06-30
Reason: Moving to separate phone_numbers table
Migration: See customer_phone_migration.md
Impact: 3 dashboards, 2 reports, 1 API
Common Workflows
Finding Data
Need customer data?
↓
Search "customer" in Metadata
↓
See all customer-related assets
↓
Filter by quality rating
↓
Choose the best one
↓
Click to see schema
↓
Use in your query
Documenting New Dataset
Created new data?
↓
Click "Add to Catalog"
↓
Fill in name, description, owner
↓
Add schema
↓
Set tags and classification
↓
Add to catalog
↓
Team can discover and use it
Data Quality Investigation
Dashboard shows strange numbers?
↓
Click Metadata
↓
Find source data asset
↓
Check quality metrics
↓
See if metrics degraded
↓
Find owner
↓
Contact to fix
Advanced Features
Data Profiling
Automatic analysis showing:
Column: age
Type: Integer
Values: 1,234,567
Unique: 102 different values
Min: 18
Max: 99
Average: 42.3
Nulls: 0 (0%)
Distribution: Mostly 30-50 age group
Relationship Mapping
Show how data connects:
customers
↓ foreign key (customer_id)
orders
↓ foreign key (order_id)
order_items
↓ foreign key (product_id)
products
Impact Analysis
See what breaks if you change data:
Planning to remove "legacy_id" column?
Impact:
- 5 reports depend on it
- 2 dashboards use it
- 1 integration feeds it
Recommendation: Keep 6 more months before removing
Best Practices
Keep Descriptions Updated
- Update when schema changes
- Add examples
- Document business rules
Use Consistent Naming
- Snake_case for columns
- Meaningful names
- Avoid abbreviations
Tag Strategically
- Use consistent tag names
- Don't over-tag
- Combine tags for filtering
Set Quality Rules
- Define data quality expectations
- Monitor compliance
- Alert on degradation
Troubleshooting
Can't find data I need
- Try different search terms
- Browse by category
- Ask data owner
- Check Lineage for dependencies
Data quality degraded
- Check what changed
- Review transformations
- Look at source data
- Contact data owner
Schema mismatch
- Compare with documentation
- Check for recent changes
- Verify integration is updated
Related Topics
- Lineage - See where data comes from
- Activity Logs - Track metadata changes
- Database Clusters - Store your data