Natural Language Processing with Swift River

Ushahidi
Mar 7, 2010

One of the core features of Swift River is the Language Computation Core, or SiLCC as we like to call it (Swift Language Computation Component). Users send feeds to SiLCC which, using a number of machine learning techniques, parses the incoming text and extracts relevant keywords. The idea is that these keywords (tags) can then be used to infer taxonomic relationships between content items. Some camps refer to this as semantic programming, others refer to it as artificial intelligence, but the general concept remains the same: helping programs to perform tasks based on a growing series of complex conditions. In this case 'auto-tagging' or 'predictive tagging' based on conditions learned from user behavior and preset rules. The diagrams below illustrate how this dataflow works. Text passing through SiLCC are parsed, tags are extracted, those tags are then reapplied in the Swift River UI. There, Swift attempts to build relationships between tags. (ex. items tagged with "chile" and "earthquake" are likely related. However items tagged 'chili' and 'earthquake' likely are not.) Of course other factors are considered like date, time, the point of origin and location of the content creator.

Swiftriver – SiLCC Dataflow

One of the services running within SiLCC is another service called SLISa, which we like to call Lisa (because the 's' is silent, hehe). SLISa is the Swift Language Improvement Service App and it trains SiLCC to learn from user interaction. When users of Swift edit or flag tags as inaccurate, SLISa is the service that creates all the conditions that helps SiLCC to learn from it's mistakes and improve for the future.

Swiftriver – SLISa Dataflow

SiLCC is an open source project being developed in Python using the pyNLP toolkit. There's several additional layers of text parsing that I haven't touched upon including how SiLCC deals with SMS txtspk and Twitter picoformats like hashtags but more on that in a future post! More on SiLCC at http://swift.ushahidi.com/extend/silcc/. If you have a passion for machine learning, large data sets, and intricate algorithms you might also consider joining the Swift River Google Group or our public Skype Chat. The Alpha release of Swift River, Version 0.0.9 Rumba will be available to the public on March 31, 2010. Developers can find always find the latest working build and issue tracker at http://github.com/ushahidi/Swiftriver.