Installation Website
NLTK is the Natural Language Tool Kit. It is installed via the conda package managment system and may already be installed in your environment.
conda install nltk -y
Fetching package metadata .............
Solving package specifications: .
Package plan for installation in environment /Users/hannah/miniconda3/envs/installenv:
The following NEW packages will be INSTALLED:
nltk: 3.3.0-py36_0
You then need to install the data that NLTK relies on to function. This may take several minutes (depending on your internet connection). Some packages may fail installation due to being outdated - this is alright, and will not be a problem for our lessons. If you get an error about a package failing, just shut down the install and move on to the install test.
python
to launch a Python interpreter. You should get something like this:
Python 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
Alternatively, you can launch a Jupyter Notebook
Python 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
If this step fails, you need to install NLTK:
Load the NLTK GUI download tool:
nltk.download()
For example, the interpreter above would now look like:
Python 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download()
The Python environment that the GUI was launched from should now have a message that looks something like this:
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
From here, you can choose what you would like to download. We recommend that you download everything by selecting all
.
This may take several minutes (depending on your internet connection). Press the refresh button if the install is stalling and ignore errors.
NLTK also provides a text based download tool.
nltk.download('all', halt_on_error=False)
For example, the interpreter above would now look like:
Python 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('all', halt_on_error=False)
[nltk_data] Downloading collection 'all'
[nltk_data] |
[nltk_data] | Downloading package abc to
[nltk_data] | /usr/local/share/nltk_data...
[nltk_data] | Package abc is already up-to-date!
...omitted...
[nltk_data] | Downloading package mwa_ppdb to
[nltk_data] | /usr/local/share/nltk_data...
[nltk_data] | Package mwa_ppdb is already up-to-date!
[nltk_data] |
[nltk_data] Downloaded collection 'all' with errors
Out[2]: True
When the installation is complete, close the NLTK Downloader and check your installation. You need to be in a Python environment such as an interpreter or Jupyter notebook.
In your Python environment, run the following code:
from nltk.corpus import brown
If your code runs and nothing happens (no error message and nothing printed to the screen), congratulations!
Check that the books corpus installed properly by typing:
from nltk.book import *
If installed successfully, you should see the following:
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
Check that the parts of speech tagger is installed correctly by typing the following:
nltk.help.upenn_tagset('NN')
If installed successfully, you should see the following:
NN: noun, common, singular or mass
common-carrier cabbage knuckle-duster Casino afghan shed thermostat
investment slide humour falloff slick wind hyena override subhumanity
machinist ...
If you get an error that includes NameError: name 'nltk' is not defined
, type import nltk
and hit return. Then try nltk.help.upenn_tagset('NN')
again.