πŸ—ƒοΈ Jitsu

Jitsu (Code) is an open source alternative to Segment, which is MIT licensed and ready to ingest all your data and re-route it to wherever it shoudl go.

Jitsu homepage (source: jitsu.com)

Jitsu has been around for a while, and is backed by Y Combinator, so it's got some staying power (and funding), while also being MIT licensed, so truly open source.

Jitsu works to collect any data (normally analytics data, for example Google Analytics), and forwarding that information to any data sources you have set up:

How Jitsu works (source: jitsu.com)

🌠 Features

If you've Segment, you'll love Jitsu:

  • Capture events (i.e. analytics) from your apps and webpages
  • Stores analytics in a variety of backends (supports Clickhouse, Postgres, Redshift and more)
  • Redirects events to different backends
  • Helps manage and solve user identity
  • Supports deploying from behind a custom domain
  • Allow for data transformation as data is streamed

πŸ€– As described by AI

Jitsu, available at jitsu.com, is an open-source platform designed for the efficient collection of event data from a wide array of sources such as web, apps, email, chatbots, and CRM systems into data warehouses. It emphasizes ease of setup, similar to adding a Google Analytics tag, and aims to make data collection as quick and secure as possible. The platform promotes a unified approach to data handling, ensuring that the data warehouse becomes the single source of truth for all data, supporting real-time event streaming for immediate analysis readiness, and offering developer flexibility through Jitsu Functions. These functions allow for the modification, filtering, or augmentation of events using a JavaScript runtime environment, tapping into an extensive ecosystem of tools and libraries.
Jitsu's implementation process is streamlined into three straightforward steps: capture, store, and discover, facilitating easy event capture from various engagement points, utilization of a data warehouse for data autonomy, and flexible data analysis for insightful user behavior understanding. Additionally, Jitsu distinguishes itself with features like automatic real-time user identity graph construction, inclusion of Clickhouse for data warehousing, and custom domain deployment to minimize ad-blocker impacts. This makes Jitsu a compelling choice for businesses and developers seeking an open-source, comprehensive solution for real-time data collection and analysis.

πŸ“Ί Watch this

At the end of 2023, Jitsu embarked on a new phase of it's product development, introducing Jitsu Next:

Introducing Jitsu Next
Jitsu introduces Jitsu.Next, the fastest and most efficient way to capture event data into your data stack. New UI, new core, unlimited incoming events on all plans including a free tier and many other amazing features.

The new product looks stunning, and has many more features available, so kudos to the team on getting such large changes shipped!

πŸ‘Ÿ Getting started

Jitsu is written in Typescript, and while NodeJS can be hard to deploy at times, they have great documentation on getting started, which is worth a read.

That said, jitsu comes with a bunch of moving pieces (outside of databases and message brokers), so it's a bit more complex to deploy than other applications.

Deploying with Docker

Along with that, Jitsu is docker friendly which makes deployment much easier - though there are many moving pieces, you can take a look at the docker-compose.yml:

jitsu/docker/docker-compose.yml at newjitsu Β· jitsucom/jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days - jitsucom/jitsu

The important bit of the docker-compose.yml that corresponds to jitsu itself is replicated below for easy reading:

version: "3.8"
services:
  # ... removed message broker service (kafka) ...
  
  # ... removed cache service (redis) ...
  
  # ... removed db service (postgres) ...

  console:
    tty: true
    image: jitsucom/console:${DOCKER_TAG:-latest}
    restart: "unless-stopped"
    platform: linux/amd64
    environment:
      ROTOR_URL: "http://rotor:3401"
      ROTOR_AUTH_KEY: ${BULKER_TOKEN:-default}
      BULKER_URL: "http://bulker:3042"
      CONSOLE_RAW_AUTH_TOKENS: ${CONSOLE_TOKEN:-default}
      BULKER_AUTH_KEY: ${BULKER_TOKEN:-default}
      MIT_COMPLIANT: ${MIT_COMPLIANT:-false}
      DATABASE_URL: "postgresql://postgres:${POSTGRES_PASSWORD:-default}@postgres:5432/postgres?schema=newjitsu"
      REDIS_URL: "redis://default:${REDIS_PASSWORD:-default}@redis:6379"
      SEED_USER_EMAIL: ${SEED_USER_EMAIL:-}
      SEED_USER_PASSWORD: ${SEED_USER_PASSWORD:-}
      ENABLE_CREDETIALS_LOGIN: ${ENABLE_CREDETIALS_LOGIN:-true}
      GITHUB_CLIENT_ID: ${GITHUB_CLIENT_ID}
      GITHUB_CLIENT_SECRET: ${GITHUB_CLIENT_SECRET}
      SYNCS_ENABLED: ${SYNCS_ENABLED:-false}
      SYNCCTL_URL: "http://syncctl:${EXTERNAL_SYNCS_PORT:-3043}"
      SYNCCTL_AUTH_KEY: ${SYNCCTL_TOKEN:-default}
      GOOGLE_SCHEDULER_KEY: ${GOOGLE_SCHEDULER_KEY}
      JITSU_INGEST_PUBLIC_URL: "${JITSU_INGEST_PUBLIC_URL:-http://localhost:${JITSU_INGEST_PORT:-8080}/}"
      JITSU_PUBLIC_URL: "${JITSU_PUBLIC_URL:-${NEXTAUTH_URL:-http://localhost:${JITSU_UI_PORT:-3000}/}}"
      NEXTAUTH_URL: "${JITSU_PUBLIC_URL:-${NEXTAUTH_URL:-http://localhost:${JITSU_UI_PORT:-3000}/}}"
      UPDATE_DB: "true"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://console:3000/api/healthcheck"]
      interval: 2s
      timeout: 10s
      retries: 30
    extra_hosts:
      - "syncctl:host-gateway"
    depends_on:
      redis:
        condition: service_started
      postgres:
        condition: service_healthy
    ports:
      - "${JITSU_UI_PORT:-3000}:3000"

  sync-catalog-init:
    tty: true
    image: curlimages/curl
    restart: "on-failure"
    environment:
      CONSOLE_TOKEN: ${CONSOLE_TOKEN:-default}
    command: "curl --silent --output nul --show-error -H 'Authorization: Bearer service-admin-account:${CONSOLE_TOKEN:-default}' http://console:3000/api/admin/catalog-refresh?initial=true"
    depends_on:
      console:
        condition: service_healthy

  bulker:
    tty: true
    image: jitsucom/bulker:${DOCKER_TAG:-latest}
    platform: linux/amd64
    restart: "unless-stopped"
    environment:
      TERM: "xterm-256color"
      BULKER_KAFKA_BOOTSTRAP_SERVERS: "kafka:9092"
      BULKER_RAW_AUTH_TOKENS: ${BULKER_TOKEN:-default}
      BULKER_CONFIG_SOURCE: "http://console:3000/api/admin/export/bulker-connections"
      BULKER_CONFIG_SOURCE_HTTP_AUTH_TOKEN: "service-admin-account:${CONSOLE_TOKEN:-default}"
      BULKER_CACHE_DIR: "/tmp/cache"
      REDIS_URL: "redis://default:${REDIS_PASSWORD:-default}@redis:6379"
      BULKER_INTERNAL_TASK_LOG: '{"id":"task_log","metricsKeyPrefix":"syncs","usesBulker":true,"type":"postgres","options":{"mode":"stream"},"credentials":{"host":"postgres","port":5432,"sslMode":"disable","database":"postgres","password":"${POSTGRES_PASSWORD:-default}","username":"postgres","defaultSchema":"newjitsu"}}'
    healthcheck:
      test: ["CMD", "curl", "-f", "http://bulker:3042/health"]
      interval: 2s
      timeout: 10s
      retries: 15
    ports:
      - "${EXTERNAL_BULKER_PORT:-3042}:3042"
    depends_on:
      console:
        condition: service_healthy
      kafka:
        condition: service_healthy

  rotor:
    tty: true
    image: jitsucom/rotor:${DOCKER_TAG:-latest}
    platform: linux/amd64
    restart: "unless-stopped"
    environment:
      ROTOR_RAW_AUTH_TOKENS: ${BULKER_TOKEN:-default}
      BULKER_URL: "http://bulker:3042"
      BULKER_AUTH_KEY: ${BULKER_TOKEN:-default}
      KAFKA_BOOTSTRAP_SERVERS: "kafka:9092"
      REDIS_URL: "redis://default:${REDIS_PASSWORD:-default}@redis:6379"
      REPOSITORY_BASE_URL: "http://console:3000/api/admin/export/"
      REPOSITORY_AUTH_TOKEN: "service-admin-account:${CONSOLE_TOKEN:-default}"
      REPOSITORY_CACHE_DIR: "/tmp/cache"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://rotor:3401/health"]
      interval: 5s
      timeout: 10s
      retries: 15
    depends_on:
      console:
        condition: service_healthy
      bulker:
        condition: service_healthy
  #    ports:
  #      - "3401:3401"

  ingest:
    tty: true
    image: jitsucom/ingest:${DOCKER_TAG:-latest}
    platform: linux/amd64
    restart: "unless-stopped"
    environment:
      TERM: "xterm-256color"
      INGEST_PUBLIC_URL: "${JITSU_INGEST_PUBLIC_URL:-http://localhost:${JITSU_INGEST_PORT:-8080}/}"
      INGEST_KAFKA_BOOTSTRAP_SERVERS: "kafka:9092"
      INGEST_RAW_AUTH_TOKENS: ${BULKER_TOKEN:-default}
      INGEST_REPOSITORY_URL: "http://console:3000/api/admin/export/streams-with-destinations"
      INGEST_SCRIPT_ORIGIN: "http://console:3000/api/s/javascript-library"
      INGEST_REPOSITORY_AUTH_TOKEN: "service-admin-account:${CONSOLE_TOKEN:-default}"
      INGEST_CACHE_DIR: "/tmp/cache"
      INGEST_REDIS_URL: "redis://default:${REDIS_PASSWORD:-default}@redis:6379"
      INGEST_ROTOR_URL: "http://rotor:3401"
      INGEST_ROTOR_AUTH_KEY: ${BULKER_TOKEN:-default}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://ingest:3049/health"]
      interval: 2s
      timeout: 10s
      retries: 15
    depends_on:
      console:
        condition: service_healthy
      rotor:
        condition: service_started
    ports:
      - "${JITSU_INGEST_PORT:-8080}:3049"

  syncctl:
    tty: true
    image: jitsucom/syncctl:${DOCKER_TAG:-latest}
    platform: linux/amd64
    restart: "on-failure"
    environment:
      TERM: "xterm-256color"
      HTTP_PORT: ${EXTERNAL_SYNCS_PORT:-3043}
      SYNCCTL_SYNCS_ENABLED: ${SYNCS_ENABLED:-false}
      SYNCCTL_RAW_AUTH_TOKENS: ${SYNCCTL_TOKEN:-default}
      SYNCCTL_DATABASE_URL: "postgresql://postgres:${POSTGRES_PASSWORD:-default}@127.0.0.1:${EXTERNAL_POSTGRES_PORT:-5432}/postgres?search_path=newjitsu"
      SYNCCTL_SIDECAR_DATABASE_URL: "postgresql://postgres:${POSTGRES_PASSWORD:-default}@${EXTERNAL_POSTGRES_HOST}:${EXTERNAL_POSTGRES_PORT:-5432}/postgres?search_path=newjitsu"
      SYNCCTL_BULKER_URL: "http://${EXTERNAL_BULKER_HOST}:${EXTERNAL_BULKER_PORT:-3042}"
      SYNCCTL_BULKER_AUTH_TOKEN: ${BULKER_TOKEN:-default}
      SYNCCTL_KUBERNETES_CLIENT_CONFIG: "${SYNCCTL_KUBERNETES_CLIENT_CONFIG:-local}"
      SYNCCTL_KUBERNETES_CONTEXT: "${SYNCCTL_KUBERNETES_CONTEXT}"
    network_mode: "host"
    depends_on:
      bulker:
        condition: service_healthy

This is a lot of config, and quite a complex system, but it promises to scale up very well, as we have key separation between heavy operations like ingest and syncing.

πŸ§‘β€πŸ’» Want to contribute?

Jitsu is open to contributions, and has a well maintained bug tracker:

Issues Β· jitsucom/jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days - Issues Β· jitsucom/jitsu

Of course, the first place to start is to check out their contributing guidelines first:

jitsu/CONTRIBUTING.md at newjitsu Β· jitsucom/jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days - jitsucom/jitsu