Data Catalog

Data Catalog

  • Alation Overview

    • https://www.youtube.com/watch?v=sPqeMCvW8TE&t=61s&ab_channel=GreatDataMinds

    • Repository of metadata; helps with data governance, collaboration, analysis

    • Contains data sets reports, queries, of all info stored in a data lake

    • Glossaries, lineage

    • Helps find information, understand whether its stale or not - want to create a single source of truth

      • Data literacy - proper interpretation

      • Data governance - responsibility, authority

    • Why now?

      • Data explosion

      • Changing workforce

      • Evolving data privacy laws

    • Use cases

      • Analytics, governance, cloud migration, data privacy/gdpr, risk and compliance, digital transformation

      • American Family insurance is a big customer

    • Catalogs

    • Follow

Alation

  • Universal search bar

  • Ability to find information, star information, watch information; functional resources

  • Query - look for assets - tables, columns, schemas, BI reports; conversations from users in the catalog; focused on data analyst productivity initially;

  • Allow users to depricate reports; gives a steward ownership of the data; gives specific lineage on where data is coming from and going to

  • Warnings - tell user how to use the data and how to distribute / not distributing;

  • Lineage - use it to reverse engineer where the data is coming from; can see who the top users are for an individual data set; stewards are able to govern the data for HIPAA/PII

  • Queries become assets in the catalog; can go all the way through to IDE for SQL queries that carries warnings directly to the IDE; joins and filters are also put into data catalog

  • See most frequently queried columns; good for knowing which columns to migrate

  • Lineage - shows impact of all downstream assets; there could be a staging table that is upstream and feeding data; could help understand what really needs to be migrated

  • All assets need to be ingested into the catalog; REST API for lineage

  • Matching to other columns and aliases; machine learning will look for suggested terms; if they are in the catalog and discoverable its a high probability that alation will make the connection

  • Data governance implementation

    • We enable visual governance - can enable stewards to take ownership in catalog; some data tables don't need stewards but a lot of the time you need to attach stewards; stewardship dashboard - can add stewards - could examine them as well; can send messages to data analysts in the platform

  • Support for semi-structured, unstructured

    • We can make it a catalog source if it can be described with a little bit of JSON

  • Follow up questions

    • How much effort / commitment does it take to get data into alation?

    • What percentage of people using the platform are BI analysts, data analysts, and business leaders?

    • New connectors or new products or new features?

    • Future - data privacy, data discovery, data sharing, and data acquisition

    • https://www.youtube.com/watch?v=BWWFImMibxM

  • Reason for building

    • It was an inventory management system held by IT

    • Data governance spurred by regulations

  • New application

    • Data governance off the ground much quicker than ever before

    • Automated ingestion and auto-matic role definition; policies are no longer manually entered and centralized

  • Multi-cloud governance and security

    • Differences can diversify IT portfolios

    • Automate / ML - at the point of usage people knows what they are allowed to do; they know policy has been applied to database

    • Human brilliance -

    • Snowflake - roadmap; policies and control from snowflake into alation; another plane to manage data inside of enterprise;

    • Federated data governance; data sharing as well - getting up and running quickly because queries may already be built

    • Single source of reference