๐๏ธ Jitsu
Jitsu (Code) is an open source alternative to Segment, which is MIT licensed and ready to ingest all your data and re-route it to wherever it shoudl go.
Jitsu has been around for a while, and is backed by Y Combinator, so it's got some staying power (and funding), while also being MIT licensed, so truly open source.
Jitsu works to collect any data (normally analytics data, for example Google Analytics), and forwarding that information to any data sources you have set up:
๐ Features
If you've Segment, you'll love Jitsu:
- Capture events (i.e. analytics) from your apps and webpages
- Stores analytics in a variety of backends (supports Clickhouse, Postgres, Redshift and more)
- Redirects events to different backends
- Helps manage and solve user identity
- Supports deploying from behind a custom domain
- Allow for data transformation as data is streamed
๐ค As described by AI
Jitsu, available at jitsu.com, is an open-source platform designed for the efficient collection of event data from a wide array of sources such as web, apps, email, chatbots, and CRM systems into data warehouses. It emphasizes ease of setup, similar to adding a Google Analytics tag, and aims to make data collection as quick and secure as possible. The platform promotes a unified approach to data handling, ensuring that the data warehouse becomes the single source of truth for all data, supporting real-time event streaming for immediate analysis readiness, and offering developer flexibility through Jitsu Functions. These functions allow for the modification, filtering, or augmentation of events using a JavaScript runtime environment, tapping into an extensive ecosystem of tools and libraries.
Jitsu's implementation process is streamlined into three straightforward steps: capture, store, and discover, facilitating easy event capture from various engagement points, utilization of a data warehouse for data autonomy, and flexible data analysis for insightful user behavior understanding. Additionally, Jitsu distinguishes itself with features like automatic real-time user identity graph construction, inclusion of Clickhouse for data warehousing, and custom domain deployment to minimize ad-blocker impacts. This makes Jitsu a compelling choice for businesses and developers seeking an open-source, comprehensive solution for real-time data collection and analysis.
๐บ Watch this
At the end of 2023, Jitsu embarked on a new phase of it's product development, introducing Jitsu Next:
The new product looks stunning, and has many more features available, so kudos to the team on getting such large changes shipped!
๐ Getting started
Jitsu is written in Typescript, and while NodeJS can be hard to deploy at times, they have great documentation on getting started, which is worth a read.
That said, jitsu
comes with a bunch of moving pieces (outside of databases and message brokers), so it's a bit more complex to deploy than other applications.
Deploying with Docker
Along with that, Jitsu is docker
friendly which makes deployment much easier - though there are many moving pieces, you can take a look at the docker-compose.yml
:
The important bit of the docker-compose.yml
that corresponds to jitsu
itself is replicated below for easy reading:
version: "3.8"
services:
# ... removed message broker service (kafka) ...
# ... removed cache service (redis) ...
# ... removed db service (postgres) ...
console:
tty: true
image: jitsucom/console:${DOCKER_TAG:-latest}
restart: "unless-stopped"
platform: linux/amd64
environment:
ROTOR_URL: "http://rotor:3401"
ROTOR_AUTH_KEY: ${BULKER_TOKEN:-default}
BULKER_URL: "http://bulker:3042"
CONSOLE_RAW_AUTH_TOKENS: ${CONSOLE_TOKEN:-default}
BULKER_AUTH_KEY: ${BULKER_TOKEN:-default}
MIT_COMPLIANT: ${MIT_COMPLIANT:-false}
DATABASE_URL: "postgresql://postgres:${POSTGRES_PASSWORD:-default}@postgres:5432/postgres?schema=newjitsu"
REDIS_URL: "redis://default:${REDIS_PASSWORD:-default}@redis:6379"
SEED_USER_EMAIL: ${SEED_USER_EMAIL:-}
SEED_USER_PASSWORD: ${SEED_USER_PASSWORD:-}
ENABLE_CREDETIALS_LOGIN: ${ENABLE_CREDETIALS_LOGIN:-true}
GITHUB_CLIENT_ID: ${GITHUB_CLIENT_ID}
GITHUB_CLIENT_SECRET: ${GITHUB_CLIENT_SECRET}
SYNCS_ENABLED: ${SYNCS_ENABLED:-false}
SYNCCTL_URL: "http://syncctl:${EXTERNAL_SYNCS_PORT:-3043}"
SYNCCTL_AUTH_KEY: ${SYNCCTL_TOKEN:-default}
GOOGLE_SCHEDULER_KEY: ${GOOGLE_SCHEDULER_KEY}
JITSU_INGEST_PUBLIC_URL: "${JITSU_INGEST_PUBLIC_URL:-http://localhost:${JITSU_INGEST_PORT:-8080}/}"
JITSU_PUBLIC_URL: "${JITSU_PUBLIC_URL:-${NEXTAUTH_URL:-http://localhost:${JITSU_UI_PORT:-3000}/}}"
NEXTAUTH_URL: "${JITSU_PUBLIC_URL:-${NEXTAUTH_URL:-http://localhost:${JITSU_UI_PORT:-3000}/}}"
UPDATE_DB: "true"
healthcheck:
test: ["CMD", "curl", "-f", "http://console:3000/api/healthcheck"]
interval: 2s
timeout: 10s
retries: 30
extra_hosts:
- "syncctl:host-gateway"
depends_on:
redis:
condition: service_started
postgres:
condition: service_healthy
ports:
- "${JITSU_UI_PORT:-3000}:3000"
sync-catalog-init:
tty: true
image: curlimages/curl
restart: "on-failure"
environment:
CONSOLE_TOKEN: ${CONSOLE_TOKEN:-default}
command: "curl --silent --output nul --show-error -H 'Authorization: Bearer service-admin-account:${CONSOLE_TOKEN:-default}' http://console:3000/api/admin/catalog-refresh?initial=true"
depends_on:
console:
condition: service_healthy
bulker:
tty: true
image: jitsucom/bulker:${DOCKER_TAG:-latest}
platform: linux/amd64
restart: "unless-stopped"
environment:
TERM: "xterm-256color"
BULKER_KAFKA_BOOTSTRAP_SERVERS: "kafka:9092"
BULKER_RAW_AUTH_TOKENS: ${BULKER_TOKEN:-default}
BULKER_CONFIG_SOURCE: "http://console:3000/api/admin/export/bulker-connections"
BULKER_CONFIG_SOURCE_HTTP_AUTH_TOKEN: "service-admin-account:${CONSOLE_TOKEN:-default}"
BULKER_CACHE_DIR: "/tmp/cache"
REDIS_URL: "redis://default:${REDIS_PASSWORD:-default}@redis:6379"
BULKER_INTERNAL_TASK_LOG: '{"id":"task_log","metricsKeyPrefix":"syncs","usesBulker":true,"type":"postgres","options":{"mode":"stream"},"credentials":{"host":"postgres","port":5432,"sslMode":"disable","database":"postgres","password":"${POSTGRES_PASSWORD:-default}","username":"postgres","defaultSchema":"newjitsu"}}'
healthcheck:
test: ["CMD", "curl", "-f", "http://bulker:3042/health"]
interval: 2s
timeout: 10s
retries: 15
ports:
- "${EXTERNAL_BULKER_PORT:-3042}:3042"
depends_on:
console:
condition: service_healthy
kafka:
condition: service_healthy
rotor:
tty: true
image: jitsucom/rotor:${DOCKER_TAG:-latest}
platform: linux/amd64
restart: "unless-stopped"
environment:
ROTOR_RAW_AUTH_TOKENS: ${BULKER_TOKEN:-default}
BULKER_URL: "http://bulker:3042"
BULKER_AUTH_KEY: ${BULKER_TOKEN:-default}
KAFKA_BOOTSTRAP_SERVERS: "kafka:9092"
REDIS_URL: "redis://default:${REDIS_PASSWORD:-default}@redis:6379"
REPOSITORY_BASE_URL: "http://console:3000/api/admin/export/"
REPOSITORY_AUTH_TOKEN: "service-admin-account:${CONSOLE_TOKEN:-default}"
REPOSITORY_CACHE_DIR: "/tmp/cache"
healthcheck:
test: ["CMD", "curl", "-f", "http://rotor:3401/health"]
interval: 5s
timeout: 10s
retries: 15
depends_on:
console:
condition: service_healthy
bulker:
condition: service_healthy
# ports:
# - "3401:3401"
ingest:
tty: true
image: jitsucom/ingest:${DOCKER_TAG:-latest}
platform: linux/amd64
restart: "unless-stopped"
environment:
TERM: "xterm-256color"
INGEST_PUBLIC_URL: "${JITSU_INGEST_PUBLIC_URL:-http://localhost:${JITSU_INGEST_PORT:-8080}/}"
INGEST_KAFKA_BOOTSTRAP_SERVERS: "kafka:9092"
INGEST_RAW_AUTH_TOKENS: ${BULKER_TOKEN:-default}
INGEST_REPOSITORY_URL: "http://console:3000/api/admin/export/streams-with-destinations"
INGEST_SCRIPT_ORIGIN: "http://console:3000/api/s/javascript-library"
INGEST_REPOSITORY_AUTH_TOKEN: "service-admin-account:${CONSOLE_TOKEN:-default}"
INGEST_CACHE_DIR: "/tmp/cache"
INGEST_REDIS_URL: "redis://default:${REDIS_PASSWORD:-default}@redis:6379"
INGEST_ROTOR_URL: "http://rotor:3401"
INGEST_ROTOR_AUTH_KEY: ${BULKER_TOKEN:-default}
healthcheck:
test: ["CMD", "curl", "-f", "http://ingest:3049/health"]
interval: 2s
timeout: 10s
retries: 15
depends_on:
console:
condition: service_healthy
rotor:
condition: service_started
ports:
- "${JITSU_INGEST_PORT:-8080}:3049"
syncctl:
tty: true
image: jitsucom/syncctl:${DOCKER_TAG:-latest}
platform: linux/amd64
restart: "on-failure"
environment:
TERM: "xterm-256color"
HTTP_PORT: ${EXTERNAL_SYNCS_PORT:-3043}
SYNCCTL_SYNCS_ENABLED: ${SYNCS_ENABLED:-false}
SYNCCTL_RAW_AUTH_TOKENS: ${SYNCCTL_TOKEN:-default}
SYNCCTL_DATABASE_URL: "postgresql://postgres:${POSTGRES_PASSWORD:-default}@127.0.0.1:${EXTERNAL_POSTGRES_PORT:-5432}/postgres?search_path=newjitsu"
SYNCCTL_SIDECAR_DATABASE_URL: "postgresql://postgres:${POSTGRES_PASSWORD:-default}@${EXTERNAL_POSTGRES_HOST}:${EXTERNAL_POSTGRES_PORT:-5432}/postgres?search_path=newjitsu"
SYNCCTL_BULKER_URL: "http://${EXTERNAL_BULKER_HOST}:${EXTERNAL_BULKER_PORT:-3042}"
SYNCCTL_BULKER_AUTH_TOKEN: ${BULKER_TOKEN:-default}
SYNCCTL_KUBERNETES_CLIENT_CONFIG: "${SYNCCTL_KUBERNETES_CLIENT_CONFIG:-local}"
SYNCCTL_KUBERNETES_CONTEXT: "${SYNCCTL_KUBERNETES_CONTEXT}"
network_mode: "host"
depends_on:
bulker:
condition: service_healthy
This is a lot of config, and quite a complex system, but it promises to scale up very well, as we have key separation between heavy operations like ingest and syncing.
๐งโ๐ป Want to contribute?
Jitsu is open to contributions, and has a well maintained bug tracker:
Of course, the first place to start is to check out their contributing guidelines first: