Curiosity for Developers
  • Overview
  • Getting Started
    • Introduction
    • System Overview
      • Workspace
      • Connectors
      • Front End
    • Requirements
    • Installation
      • Deploying on Windows
        • Download Curiosity Workspace for Windows
      • Deploying on Docker
        • Deploying using Docker Desktop App
        • Docker Hub
      • Deploying on Kubernetes
      • Deploying on OpenShift
      • Configuration
    • Configure your Workspace
    • Connecting to a Workspace
      • Download App
    • Built-in Templates
  • Security
    • Introduction
    • Hosting
    • Encryption
    • Users and Access
      • User Invitations
      • Single Sign-On (SSO)
        • Google Sign-In
        • Microsoft / Azure AD
        • Okta
        • Auth0
    • Permissions Management
    • Auditing
    • Teams management
    • Configuring Backup
      • Restoring a backup
    • Activate a workspace license
  • Data Sources
    • Introduction
    • User Apps
    • Workspace Integrations
    • API Integrations
      • Introduction
      • Data Modeling
      • Writing a Connector
      • Access Control
      • API Tokens
      • API Overview
      • Tips
    • Supported File Types
    • Curiosity CLI
      • Installation
      • Authentication
      • Commands
  • Search
    • Introduction
    • Languages
    • Synonyms
    • Ranking
    • Filters
    • Search Permissions and Access Control
  • Endpoints
    • Introduction
    • Creating an endpoint
    • Calling an endpoint
    • Endpoint Tokens
    • Endpoints API
  • Interfaces
    • Introduction
    • Local Development
    • Deploying a new interface
    • Routing
    • Node Renderers
    • Sidebar
    • Views
  • Artificial Intelligence
    • Introduction
    • Embeddings Search
    • AI Assistant
      • Enabling AI Assistant
    • Large Language Models
      • LLMs Models Configuration
      • Self-Hosted Models
    • Image Search
    • Audio and Video Search
  • Sample Workspaces
    • Introduction
    • HackerNews
    • Aviation Incidents
    • Covid Papers
    • NASA Public Library
    • Suggest a Recipe
  • Basic Concepts
    • Graph database
    • Search Engine
  • Troubleshooting
    • FAQs
      • How long does it take to set up?
      • How does Curiosity keep my data safe?
      • Can we get Curiosity on-premises?
      • Can I connect custom data?
      • How does Workspace pricing work?
      • Which LLM does Curiosity use?
      • What's special about Curiosity?
      • How are access permissions handled?
      • What enterprise tools can I connect?
      • How to access a workspace?
      • How do I hard refresh my browser?
      • How do I report bugs?
      • How do I solve connectivity issues?
      • How do I contact support?
  • Policies
    • Terms of Service
    • Privacy Policy
Powered by GitBook
On this page
  • File Handling
  • Graph Database
  • Security
  • Permissions
  • Business Logic
  • Natural Language Processing (NLP)
  • Optical Character Recognition (OCR) & Speech to Text (STT)
  • Search Engine
  • Vector Search
  • LLMs
  1. Getting Started
  2. System Overview

Workspace

The Workspace is responsible for providing data processing and back-end functionality in the system.

PreviousSystem OverviewNextConnectors

Last updated 11 months ago

File Handling

File handling (or "file extraction") is the process of transforming files into text for further processing and indexing by the system. Curiosity supports most common file types.

Graph Database

Security

Permissions

Business Logic

Natural Language Processing (NLP)

Curiosity includes natural language processing for tasks such as

  • Language recognition

  • Tokenization

  • Lemmatization

  • Syntax parsing

  • Named Entity Recognition (NER)

  • Vectorization

  • Topic modelling

Optical Character Recognition (OCR) & Speech to Text (STT)

Curiosity includes integrated optical character recognition (OCR) and text-to-speech (TTS) models. These models allow data from images and video/audio to be transformed to text for further processing. This enables search across images, scanned PDFs, videos and audio files.

Search Engine

Vector Search

Vector search (aka "semantic search" or "neural search") uses embeddings to represent meaning and retrieve related results. Curiosity includes out of the box support for vector search:

Embedding models: Curiosity has support for multi-modal embeddings, and integrates them tightly with the search and graph database engines. Embeddings are computed automatically at index time and whenever data changes. Curiosity also provides support for using external embedding models.

Vector retrieval ("vector indexing"): Curiosity indexes embeddings for querying and searching automatically using HNSW indexes.

LLMs

Large language models (LLMs) are machine learning models that are trained to predict text based on text input. Curiosity integrates with both commercial and open-source large language models. The following models are currently supported:

  • GPT-3.5, GPT-4, GPT-4o

  • Llama2 7B, Llama2 13B, Llama3 8B

  • MistralLite 7B, Mistral 7B

  • Phi 2B

Curiosity workspaces are built on top of an embedded in-memory graph database. The database is a labeled property graph, and supports a Gremlin-like query language. Learn more about the .

Curiosity includes a range of security features, including encryption (in transit and at rest), access management, logging, auditing, and more. Learn more about .

Curiosity also includes permissions, or access control. That ensures user permissions from the source data are enforced in the Curiosity system. Learn more about and .

You can run custom business logic on Curiosity Workspaces. That is useful for adding custom features without needing to host code outside the system. Business logic can be implemented as endpoints, scheduled tasks or per-data-type indexes. Learn more about and check the to see how they can be used to run custom business logic.

Curiosity integrates with a search engine that includes a range of features like full text search, faceting, boosting, synonyms, and more. Learn more about .

Locally hosted models can be run on CPU or GPU, depending on available hardware. For more details about the supported visit our website.

graph database
security in Curiosity
permissions
configuring access control
endpoints
sample workplaces
search in Curiosity
models
Curiosity Architecture