Workspace

The Workspace is responsible for providing data processing and back-end functionality in the system.

File Handling

File handling (or "file extraction") is the process of transforming files into text for further processing and indexing by the system. Curiosity supports most common file types.

Graph Database

Curiosity workspaces are built on top of an embedded in-memory graph database. The database is a labeled property graph, and supports a Gremlin-like query language. Learn more about the graph database.

Security

Curiosity includes a range of security features, including encryption (in transit and at rest), access management, logging, auditing, and more. Learn more about security in Curiosity.

Permissions

Curiosity also includes permissions, or access control. That ensures user permissions from the source data are enforced in the Curiosity system. Learn more about permissions and configuring access control.

Business Logic

You can run custom business logic on Curiosity Workspaces. That is useful for adding custom features without needing to host code outside the system. Business logic can be implemented as endpoints, scheduled tasks or per-data-type indexes. Learn more about endpoints and check the sample workplaces to see how they can be used to run custom business logic.

Natural Language Processing (NLP)

Curiosity includes natural language processing for tasks such as

  • Language recognition

  • Tokenization

  • Lemmatization

  • Syntax parsing

  • Named Entity Recognition (NER)

  • Vectorization

  • Topic modelling

Optical Character Recognition (OCR) & Speech to Text (STT)

Curiosity includes integrated optical character recognition (OCR) and text-to-speech (TTS) models. These models allow data from images and video/audio to be transformed to text for further processing. This enables search across images, scanned PDFs, videos and audio files.

Search Engine

Curiosity integrates with a search engine that includes a range of features like full text search, faceting, boosting, synonyms, and more. Learn more about search in Curiosity.

Vector search (aka "semantic search" or "neural search") uses embeddings to represent meaning and retrieve related results. Curiosity includes out of the box support for vector search:

Embedding models: Curiosity has support for multi-modal embeddings, and integrates them tightly with the search and graph database engines. Embeddings are computed automatically at index time and whenever data changes. Curiosity also provides support for using external embedding models.

Vector retrieval ("vector indexing"): Curiosity indexes embeddings for querying and searching automatically using HNSW indexes.

LLMs

Large language models (LLMs) are machine learning models that are trained to predict text based on text input. Curiosity integrates with both commercial and open-source large language models. The following models are currently supported:

  • GPT-3.5, GPT-4, GPT-4o

  • Llama2 7B, Llama2 13B, Llama3 8B

  • MistralLite 7B, Mistral 7B

  • Phi 2B

Locally hosted models can be run on CPU or GPU, depending on available hardware. For more details about the models supported visit our website.

Last updated