This tutorial explains the process of creating semantic speech networks. You can find out more about netts in our paper. To see how you can analyze semantic speech networks after creating them with netts, check out my tutorial on analyzing with netts. It picks up after the end of this tutorial, when you have generated several networks.
To understand how netts works, check out my brief explanation of the netts pipeline. I’ve also written about netts on the Cambridge Accelerate Science Blog. You can find all links at the bottom of this page.
Contents:
The tutorial is based on a jupyter notebook that goes through the steps of creating semantic speech networks with netts. You can follow along using the creating_networks
notebook in the tutorial repository. The repository includes all of the exampe data you need to go through the steps yourself.
1. Installing Netts
We will begin by quickly walking you through installing netts. Ideally, you create a Python environment to install netts into, in your project folder (This guide will show you how). But you can also install netts without a Python environment, if you don’t want to work with Python environments.
Set up a virtual environment of your choice, here we use pyenv and python 3.9 we can replace this with anything else that you want, it’s just what I’ve been using :
python3.9 -m venv .venv
source .venv/bin/activate
To install the latest official release of netts from PyPI, open up a terminal and from the command line prompt run:
pip install netts
Install Additional Dependencies
Netts may require the Java Runtime Environment. Instructions for downloading and installing for your operating system can be found here.
Netts requires additional dependencies including CoreNLP and OpenIE. You can install them either directly from the netts CLI or in Python.
To install using the CLI:
netts install
To install in a notebook:
import netts
settings = netts.get_settings()
print(f"Installing dependencies to {settings.netts_dir}")
netts.install_dependencies()
2. Constructing a semantic speech network
Netts takes speech transcripts and converts them into a semantic graph. Imagine we have the following short transcript in a file called transcript_1.txt
in the folder transcripts/
:
I see a man and he is wearing a jacket. He is standing in the dark against a light post. On the picture there seems to be like a park and… Or trees but in those trees there are little balls of light reflections as well. I cannot see the… Anything else because it’s very dark. But the man on the picture seems to wear a hat and he seems to have a hoodie on as well. The picture is very mysterious, which I like about it, but for me I would like to understand more about the picture.
We can create a semantic graph from the transcript using either the command line interface (CLI) of python package.
Let’s first print the content of transcript_1.txt
:
%%bash
cat transcripts/transcript_1.txt
I see a man and he is wearing a jacket. He is standing in the dark against a light post. On the picture there seems to be like a park and... Or trees but in those trees there are little balls of light reflections as well. I cannot see the... Anything else because it’s very dark. But the man on the picture seems to wear a hat and he seems to have a hoodie on as well. The picture is very mysterious, which I like about it, but for me I would like to understand more about the picture.
Note
If you are following along using the notebooks in my tutorial repository, then before running this command you will need to clear already generated outputs. Remove the networks from the output_folder
or rename them. Otherwise netts will not generate any new networks, as it found networks matching your input filenames in your specified output folder.
2.1 Command Line Interface
We can process a single transcript with the Command Line Interface (CLI) like this
netts run transcript.txt outputs
Inputs
We can break this down into the following components:
CLI Command | transcript.txt | outputs |
---|---|---|
netts run | Path to transcript | path of output directory |
transcript.txt
can be replaced with the full path to any.txt
file.outputs
can be replaced with the path to any directory. If the directory does not exist yet netts will create it.
We’re going to run this inside a Jupyter notebook cell now:
%%bash
netts run transcripts/transcript_1.txt output_folder
[03/22/23 19:47:41] INFO For logging information, please check
/Users/callithrix/Documents/Projects/Cambridge_Nett
s/code/netts_demo/netts_log.log
Starting CoreNLP Server...
Processing Transcript(s)...
Netts will let you know if it has found output files (png or pickle files) for the transcripts you are trying to process. In that case, netts will give a warning and stop processing any transcripts that have already been processed. If you would like to generate these files again with netts, move your old files out of the folder or rename then. Then re-run netts on your input transcript.
Outputs
Once netts processes the transcript the output directory will contain two files:
outputs/
transcript.pickle
transcript.png
The file prefix is taken from the input file (in this case transcript.txt
)
Netts will also produce a log file. Any serious issues will be printed out to the console, but minor pieces of information will end up in a log file located in your output directory.
2.2 Python Interface
If you dont want to use the netts command line interface (CLI) or want more control over netts you can use the netts python package directly. Here we’ll run through the example transcript in Python.
The transcript transcript_1.txt
is in the directory transcripts/
. We will load open it with Python and process it with netts.
import matplotlib.pyplot as plt
import netts
with netts.OpenIEClient() as openie_client, netts.CoreNLPClient(
properties={"annotators": "tokenize,ssplit,pos,lemma,parse,depparse,coref,openie", "be_quiet": "true"},
) as corenlp_client:
with open("transcripts/transcript_1.txt", encoding="utf-8") as f:
transcript = f.read()
network = netts.SpeechGraph(transcript)
network.process(
openie_client=openie_client,
corenlp_client=corenlp_client,
preprocess_config=settings.netts_config.preprocess,
)
Netts produces a lot of output. At the command line, you will be able to choose if you want all the output printed, or if you would like netts to run quietly. By default, netts runs quietly and only prints the full output to the log file. If you would like to see the full netts output at the command line, for example to check that every steps runs correctly, you can use the command line option --verbose
. To do that, you run: netts --verbose transcript_1.txt output_folder
.
We can save the constructed semantic speech network as a pickle file for later analysis. Use the netts function pickle_graph
for this.
# Save the graph object as a pickle file
with open("output_folder/transcript.pickle", "wb") as output_f:
netts.pickle_graph(network, output_f)
Pickle files are a way to save Python objects. You can later load the saved picke fle back into Python using the networkx
function read_gpickle
:
import pickle
with open("output_folder/transcript.pickle", "rb") as graph_file:
network = pickle.load(graph_file)
3. Constructing several networks
If you have a folder of transcripts you can process the entire folder with the CLI. For example, if you have a folder called transcripts/
:
transcripts/
input_folder/
transcript_2.txt
transcript_3.txt
transcript_4.txt
transcript_5.txt
You can process all of them by submitting the folder to netts with the Command Line:
%%bash
netts run transcripts/input_folder output_folder
[03/22/23 19:19:08] INFO For logging information, please check
/Users/callithrix/Documents/Projects/Cambridge_Nett
s/code/netts_demo/netts_log.log
Starting CoreNLP Server...
Processing Transcript(s)...
If you would like to know more about how netts creates semantic speech networks, have a look at my walk through the netts pipeline.
Now that we understand how we can use netts to create semantic speech networks, let’s look at how we can analyze them. Head on over to the tutorial on analysing netts networks.
Links
Documentation: https://alan-turing-institute.github.io/netts/
Tutorials:
Analyzing with netts
Netts Pipeline
Paper: Schizophrenia Bulletin
Media Coverage: Medscape Article
Cambridge Blog:
How NLP can help us understand schizophrenia
Engineering a tool to learn about schizophrenia