Concepts#
This page explains the core ideas behind NeuroLang. You do not need to read this before the Get Started with NeuroLang — it is here as a reference once you want to understand why things work the way they do.
Logic Programming & Datalog#
Logic programming is a style of programming where you declare what is true, not how to compute it. A Datalog program consists of two kinds of statements:
Facts — ground truths about the world:
with nl.environment as e:
e.region["V1"] = True
e.region["V2"] = True
e.adjacent["V1", "V2"] = True
Rules — derived truths (if A and B are true, then C is true):
with nl.environment as e:
# x is reachable from y if x is adjacent to y
e.reachable[e.x, e.y] = e.adjacent[e.x, e.y]
# x is reachable from y if x is adjacent to z and z is reachable from y
e.reachable[e.x, e.y] = e.adjacent[e.x, e.z] & e.reachable[e.z, e.y]
The Datalog engine (the solver) computes all facts that can be derived from the rules. NeuroLang extends standard Datalog with:
Aggregations —
COUNT,MAX,SUMover sets of tuplesBuilt-in functions — register any Python callable as a Datalog symbol
Tuple-generating dependencies (TGDs) — open-world reasoning rules
Probabilistic Reasoning#
Standard Datalog is deterministic: a fact is either derived or it is not. In many neuroimaging applications we need to reason about uncertain data — for example, whether a brain region is functionally connected to another region given noisy fMRI data.
NeuroLang adds independent probabilistic facts: each tuple in a probabilistic relation is an independent Bernoulli random variable with an associated probability. For example, a term-to-region mapping derived from meta-analysis might state that the term “memory” is associated with region “hippocampus” with probability 0.87.
Queries over probabilistic data use possible worlds semantics: the answer to a query is the probability that the query holds in a randomly sampled world.
Note
The probabilistic extensions are implemented in
neurolang.probabilistic. See the example gallery for concrete
neuroimaging use cases.
Neuroimaging Integration#
NeuroLang treats neuroimaging data as relational data:
Volumetric images (NIfTI, loaded via nibabel) are represented as sets of
(voxel_id, intensity)or(x, y, z, intensity)tuples.Atlas labels (e.g., Destrieux, AAL) become
(label, region_id)relations.Ontologies (OWL/RDF via rdflib) are loaded as triple stores and queried with Datalog rules.
Coordinate activations (NeuroSynth) become
(pmid, x, y, z)probabilistic relations.
The NeurolangDL and
NeurolangPDL frontends expose helper methods
such as add_tuple_set() and add_atlas_set() to load these
data sources.
Declarative Engine Registry & CLI#
NeuroLang ships a command-line interface, neurolang-query, that provides
a quick way to run Datalog queries against pre-configured datasets without
writing Python:
# List available dataset engines
neurolang-query --list-engines
# List predicates for the NeuroSynth engine
neurolang-query --engine neurosynth --list-predicates
# Run a Datalog query
neurolang-query "ans(t) :- term_in_study_tfidf(t, w, s)"
# Use the Destrieux atlas engine
neurolang-query --engine destrieux "ans(name) :- destrieux(name, region)"
Engines are declared declaratively in engines/engines.yaml:
engines:
neurosynth:
description: "Neurosynth database — forward and reverse inference"
requires_mni_mask: true
python_init: "neurolang.utils.engines.neurosynth.init"
datalog_init: |
study_with_peaks(S) :- peak_reported(I, J, K, S)
relations:
extra_data: "shared/my_data.csv"
predicates:
peak_reported:
arity: 4
columns: [i, j, k, study_id]
description: "Reported activation peaks in voxel coordinates"
...
Each engine references a Python module that exports
init_engine(nl, mask, data_dir). The neurolang.utils.engine_registry
module handles YAML loading, MNI mask retrieval, and delegating to the
engine’s init script.
Init phases. When build_engine()
builds an engine, it runs up to nine phases in order:
Builtins (
builtins) — registers known functions (exp,log,startswith) as callable symbols. Simple list in YAML:builtins: [exp, log, startswith]
Base symbols (
use_base_symbols) — whenuse_base_symbols: trueis set (andrequires_mni_maskis also true), the engine registry automatically registers common neuroimaging symbols (agg_count,agg_create_region,agg_create_region_overlay,principal_direction,region_union) without needing a custom Python init script. This replaces callinginit_base_engine()from within a Python init module:use_base_symbols: true
Python init (
python_init) — imports the module and callsinit_engine(nl, mask, data_dir). Use this for downloading data, registering complex symbols, or loading neuroimaging atlases. Ifuse_base_symbolsalready covers your needs, this phase can be omitted entirely — see thedestrieuxengine entry for an example.Downloads (
downloads) — fetches files from URLs before the other phases. Supports archive extraction:downloads: - url: "https://example.com/data.tar.gz" dest: my_engine/ extract: true
Templates (
templates) — downloads neuroimaging templates via nilearn or TemplateFlow (if thetemplateflowpackage is installed). Each template can optionally register avoxel(i, j, k)predicate from its non-zero mask:templates: # Nilearn brain mask → voxel(i, j, k) predicate mni_brain: source: nilearn variant: brain_mask predicate: voxel # Nilearn T1 template (2 mm) — just download, no predicate mni_t1: source: nilearn variant: template resolution: 2 # TemplateFlow T1 template at 1 mm t1_highres: source: templateflow template: MNI152NLin2009cAsym resolution: 1 suffix: T1w
Supported nilearn variants:
brain_mask— MNI152 brain mask (load_mni152_brain_mask)gm_mask— grey-matter mask (load_mni152_gm_mask)wm_mask— white-matter mask (load_mni152_wm_mask)template— MNI152 T1 template (load_mni152_template; passresolution: 1orresolution: 2)gm_template— grey-matter templatewm_template— white-matter template
TemplateFlow templates require
pip install templateflowand are identified by atemplatename from the TemplateFlow repository (e.g.MNI152NLin2009cAsym,MNIInfant). Thesuffixandresolutionfields select the specific image variant.When
predicateis set, the template is loaded as a NIfTI image and all non-zero voxels are registered as apredicate(i, j, k)tuple set — useful as a coordinate-space reference in Datalog queries.Atlases (
atlases) — downloads brain atlases via nilearn and registers each region as a predicate withExplicitVBRgeometry:atlases: destrieux: predicate_name: destrieux schaefer: n_rois: 400 yeo_networks: 7 resolution_mm: 2 predicate_name: schaefer_400 difumo: dimension: 64 threshold: 0.5 predicate_name: difumo_64
For probabilistic atlases (
difumo) a second predicate is registered automatically —{name}_prob(component, i, j, k, prob)— with the raw probability value at each voxel for each component (filtered byprob_threshold, default 0.01):atlases: difumo: dimension: 64 threshold: 0.5 prob_threshold: 0.01 predicate_name: difumo_64 prob_predicate_name: difumo_64_prob
Supported atlases:
destrieux(deterministic),schaefer(deterministic),difumo(probabilistic).Datalog init (
datalog_init) — an optional YAML multiline string of Datalog rules. These are evaluated after templates and atlases so they can reference template-derived voxel predicates and atlas predicates:datalog_init: | left_region(N, R) :- destrieux(N, R), startswith('lh', N) right_region(N, R) :- destrieux(N, R), startswith('rh', N)
Relations (
relations) — loads CSV/TSV files as extensional predicates. Each entry maps a relation name to a file path relative to theengines/directory. Thefilevalue may be a URL for automatic download:relations: my_table: "my_engine/data.csv" extra_refs: "shared/references.tsv"
For richer metadata that appears in
--list-predicatesoutput, use a dict with an optionaldescriptionfield:relations: my_table: file: "https://example.com/data.csv" description: "Experiment metadata loaded from CSV"
Supported formats:
.csv,.tsv,.csv.gz,.tsv.gz. The file is read withpandas.read_csvand registered viaadd_tuple_set(). Because relations are loaded after the Datalog init, your derived rules can already reference them.Probabilistic choices (
probabilistic_choice) — declares uniform probabilistic choices over existing predicates:probabilistic_choice: selected_study: source: study description: "Uniform choice over studies"
Ontologies (
ontologies) — loads OWL/RDF ontologies from URLs or local paths:ontologies: - url: "https://example.com/ontology.owl"
All phases are optional — include only what your engine needs.
If use_base_symbols and the declarative YAML fields (atlases,
templates, relations, datalog_init, probabilistic_choice)
cover your engine, you can omit python_init entirely — the
destrieux engine is a complete example of this pattern.
To add a new engine, create a YAML entry. If you need a custom Python
init script, add a corresponding init module under
neurolang/utils/engines/. See
neurolang.utils.engines.neurosynth.init for a complete Python init
example, and the destrieux engine entry for a purely YAML-driven
example.
Architecture#
NeuroLang is structured in three layers:
┌─────────────────────────────────────────────────────────┐
│ User / Python │
│ NeurolangDL / NeurolangPDL frontend │
│ (neurolang/frontend/) │
└───────────────────────┬─────────────────────────────────┘
│ Datalog/probabilistic program
▼
┌─────────────────────────────────────────────────────────┐
│ Intermediate Representation (IR) │
│ Expressions, symbol table, type system │
│ (neurolang/expressions.py, neurolang/logic/) │
└───────────────────────┬─────────────────────────────────┘
│ Relational algebra plan
▼
┌─────────────────────────────────────────────────────────┐
│ Solver │
│ Chase algorithm, relational algebra, SDD/WMC │
│ (neurolang/datalog/, neurolang/probabilistic/) │
└─────────────────────────────────────────────────────────┘
The frontend layer translates Python expressions into an internal IR. The IR layer performs type inference and expression normalisation. The solver executes the query using a chase-based fixpoint algorithm (for deterministic queries) or a weighted model counter (for probabilistic queries).