I should have realised it was a mistake to start playing with a Python natural-language parsing package at two in the morning
Quite amazing: it's literally four lines of code (which could be shortened to one, as I don't need the intermediate variables) to take a piece of English text, parse it into a representation of its linguistic structure, analyse that to identify different types of entities (e.g. distinguishing proper names from common nouns), then go over that and identify even more specific types of entities.
With those four lines, it was able to analyse a ~200-page novel in under five minutes (on my 2011 MBA) and output a list of all the names therein that are placenames somewhere in the world - a couple of false positives for characters whose names are also placenames, and it also included adjectives like "British", but still
It even recognised "Euston" and "Victoria Bridge"; and when I tried it on identifying people instead of places, it correctly spotted "Christopher Columbus", "Caesar", and "Cassivelaunus"
Quite amazing: it's literally four lines of code (which could be shortened to one, as I don't need the intermediate variables) to take a piece of English text, parse it into a representation of its linguistic structure, analyse that to identify different types of entities (e.g. distinguishing proper names from common nouns), then go over that and identify even more specific types of entities.
With those four lines, it was able to analyse a ~200-page novel in under five minutes (on my 2011 MBA) and output a list of all the names therein that are placenames somewhere in the world - a couple of false positives for characters whose names are also placenames, and it also included adjectives like "British", but still
It even recognised "Euston" and "Victoria Bridge"; and when I tried it on identifying people instead of places, it correctly spotted "Christopher Columbus", "Caesar", and "Cassivelaunus"
Comment