neurolang.frontend.neurosynth_utils module

class neurolang.frontend.neurosynth_utils.StudyID

Bases: str

Methods

capitalize(/)

Return a capitalized version of the string.

casefold(/)

Return a version of the string suitable for caseless comparisons.

center(width[, fillchar])

Return a centered string of length width.

count(sub[, start[, end]])

Return the number of non-overlapping occurrences of substring sub in string S[start:end].

encode(/[, encoding, errors])

Encode the string using the codec registered for encoding.

endswith(suffix[, start[, end]])

Return True if S ends with the specified suffix, False otherwise.

expandtabs(/[, tabsize])

Return a copy where all tab characters are expanded using spaces.

find(sub[, start[, end]])

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].

format(*args, **kwargs)

Return a formatted version of S, using substitutions from args and kwargs.

format_map(mapping)

Return a formatted version of S, using substitutions from mapping.

index(sub[, start[, end]])

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].

isalnum(/)

Return True if the string is an alpha-numeric string, False otherwise.

isalpha(/)

Return True if the string is an alphabetic string, False otherwise.

isascii(/)

Return True if all characters in the string are ASCII, False otherwise.

isdecimal(/)

Return True if the string is a decimal string, False otherwise.

isdigit(/)

Return True if the string is a digit string, False otherwise.

isidentifier(/)

Return True if the string is a valid Python identifier, False otherwise.

islower(/)

Return True if the string is a lowercase string, False otherwise.

isnumeric(/)

Return True if the string is a numeric string, False otherwise.

isprintable(/)

Return True if the string is printable, False otherwise.

isspace(/)

Return True if the string is a whitespace string, False otherwise.

istitle(/)

Return True if the string is a title-cased string, False otherwise.

isupper(/)

Return True if the string is an uppercase string, False otherwise.

join(iterable, /)

Concatenate any number of strings.

ljust(width[, fillchar])

Return a left-justified string of length width.

lower(/)

Return a copy of the string converted to lowercase.

lstrip([chars])

Return a copy of the string with leading whitespace removed.

maketrans(x[, y, z])

Return a translation table usable for str.translate().

partition(sep, /)

Partition the string into three parts using the given separator.

replace(old, new[, count])

Return a copy with all occurrences of substring old replaced by new.

rfind(sub[, start[, end]])

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].

rindex(sub[, start[, end]])

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].

rjust(width[, fillchar])

Return a right-justified string of length width.

rpartition(sep, /)

Partition the string into three parts using the given separator.

rsplit(/[, sep, maxsplit])

Return a list of the words in the string, using sep as the delimiter string.

rstrip([chars])

Return a copy of the string with trailing whitespace removed.

split(/[, sep, maxsplit])

Return a list of the words in the string, using sep as the delimiter string.

splitlines(/[, keepends])

Return a list of the lines in the string, breaking at line boundaries.

startswith(prefix[, start[, end]])

Return True if S starts with the specified prefix, False otherwise.

strip([chars])

Return a copy of the string with leading and trailing whitespace removed.

swapcase(/)

Convert uppercase characters to lowercase and lowercase characters to uppercase.

title(/)

Return a version of the string where each word is titlecased.

translate(table, /)

Replace each character in the string using the given translation table.

upper(/)

Return a copy of the string converted to uppercase.

zfill(width, /)

Pad a numeric string with zeros on the left, to fill a field of the given width.

class neurolang.frontend.neurosynth_utils.TfIDf(x=0, /)

Bases: float

Attributes:
imag

the imaginary part of a complex number

real

the real part of a complex number

Methods

as_integer_ratio(/)

Return integer ratio.

conjugate(/)

Return self, the complex conjugate of any float.

fromhex(string, /)

Create a floating-point number from a hexadecimal string.

hex(/)

Return a hexadecimal representation of a floating-point number.

is_integer(/)

Return True if the float is an integer.

neurolang.frontend.neurosynth_utils.fetch_feature_data(data_dir: Path, version: int = 7, verbose: int = 1, convert_study_ids: bool = False) DataFrame

Download if needed the tfidf_features.npz file from Neurosynth and load it into a pandas Dataframe. The tfidf_features contains feature values for different types of “vocabularies”.

The features dataframe is stored as a compressed, sparse matrix. Once loaded and reconstructed into a dense matrix, it contains one row per study and one column per label. The associated labels are loaded, as well as the study ids, to reconstruct a dataframe of size N x P, where N is the number of studies in the Neurosynth dataset, and P is the number of words in the vocabulary.

Parameters:
data_dirPath

the path for the directory where downloaded data should be saved.

versionint, optional

the neurosynth data version, by default 7

verboseint, optional

verbose param for nilearn’s _fetch_files, by default 1

convert_study_idsbool, optional

if True, cast study ids as StudyID, by default False

Returns:
pd.DataFrame

the features dataframe

neurolang.frontend.neurosynth_utils.fetch_neurosynth_peak_data(data_dir: Path, version: int = 7, verbose: int = 1, convert_study_ids: bool = False) DataFrame

Download if needed the coordinates.tsv.gz file from Neurosynth and load it into a pandas DataFrame.

The coordinates.tsv.gz contains the coordinates for the peaks reported by studies in the Neurosynth dataset. It contains one row per coordinate reported.

The metadata for each study is also loaded to include the space in which the coordinates are reported. The peak_data dataframe therefore has PR rows, PR being the number of reported peaks in the Neurosynth dataset.

The columns (for version 7) are:
  • id

  • table_id

  • table_num

  • peak_id

  • space

  • x

  • y

  • z

Parameters:
data_dirPath

the path for the directory where downloaded data should be saved.

versionint, optional

the neurosynth data version, by default 7

verboseint, optional

verbose param for nilearn’s _fetch_files, by default 1

convert_study_idsbool, optional

if True, cast study ids as StudyID, by default False

Returns:
pd.DataFrame

the peak dataframe

neurolang.frontend.neurosynth_utils.fetch_study_metadata(data_dir: Path, version: int = 7, verbose: int = 1) DataFrame

Download if needed the metadata.tsv.gz file from Neurosynth and load it into a pandas DataFrame.

The metadata table contains the metadata for each study. Each study (ID) is stored on its own line. These IDs are in the same order as the id column of the associated coordinates.tsv.gz file, but the rows will differ because the coordinates file will contain multiple rows per study. They are also in the same order as the rows in the features.npz files for the same version.

The metadata will therefore have N rows, N being the number of studies in the Neurosynth dataset. The columns (for version 7) are:

  • id

  • doi

  • space

  • title

  • authors

  • year

  • journal

Parameters:
data_dirPath

the path for the directory where downloaded data should be saved.

versionint, optional

the neurosynth data version, by default 7

verboseint, optional

verbose param for nilearn’s _fetch_files, by default 1

Returns:
pd.DataFrame

the study metadata dataframe

neurolang.frontend.neurosynth_utils.get_ns_mni_peaks_reported(data_dir: Path, version: int = 7, verbose: int = 1, convert_study_ids: bool = False) DataFrame

Load a dataframe containing the coordinates for the peaks reported by studies in the Neurosynth dataset. Coordinates for the peaks are in MNI space, with coordinates that are reported in Talaraich space converted.

The resulting dataframe contains one row for each peak reported. Each row has 4 columns:

  • id

  • x

  • y

  • z

Parameters:
data_dirPath

the path for the directory where downloaded data should be saved.

versionint, optional

the neurosynth data version, by default 7

verboseint, optional

verbose param for nilearn’s _fetch_files, by default 1

convert_study_idsbool, optional

if True, cast study ids as StudyID, by default False

Returns:
pd.DataFrame

the peak dataframe

neurolang.frontend.neurosynth_utils.get_ns_term_study_associations(data_dir: Path, version: int = 7, verbose: int = 1, convert_study_ids: bool = False, tfidf_threshold: float | None = None) DataFrame

Load a dataframe containing associations between term and studies. The dataframe contains one row for each term and study pair from the features table in the Neurosynth dataset. With each (term, study) pair comes the tfidf value for the term in the study. If a tfidf threshold value is passed, only (term, study) associations with a tfidf value > tfidf_threshold will be kept.

Parameters:
data_dirPath

the path for the directory where downloaded data should be saved.

versionint, optional

the neurosynth data version, by default 7

verboseint, optional

verbose param for nilearn’s _fetch_files, by default 1

convert_study_idsbool, optional

if True, cast study ids as StudyID, by default False

tfidf_thresholdOptional[float], optional

the minimum tfidf value for the (term, study) associations, by default None

Returns:
pd.DataFrame

the term association dataframe