How Media Pitstop Works (And Its Limitations)

Media Pitstop was built to make it a bit easier to keep up with what’s going on in the Australian media industry. It works by breaking out the subject/s contained in trade press headlines and then organising those headlines by subject – this means you can more easily find headlines about what interests you specifically.

How it works

The site uses a combination of both deterministic automation and probablisitic AI to collect, analyse, and classify headlines from across the Australian trade press. Currently, headlines are captured from AdNews, AFR Media & Marketing, B&T (Media), Mediaweek, Mi3, and Mumbrella.

It is currently in a ‘beta’-ish mode, where I’ll likely spend a few months seeing how it works in the wild and tweaking the settings accordingly. If you find something that doesn’t work, please connect and DM me via LinkedIn.

Limitations

The primary limitation is in the nature of LLMs itself. You know how you see that phrase in some apps that says something like “results are AI generated and so may not be 100% accurate“? What that means is that all LLMs are inherently not 100% stable – because of their core nature, LLMs can never be 100% accurate and at some point over a long enough timeline, they will make mistakes (or just make stuff up). I’ve set up a ton of backstops – that don’t use an LLM! – to catch the dafter outputs, but new ones pop up every now and again and might sneak through. If you find one, let me know on a DM via LinkedIn.
Some publishers restrict how headlines are collected, so there may be a delay in them appearing here. This is a result of technology restrictions that are in place. I have chosen to respect those restrictions, and the tools don’t attempt to circumvent them.
Some headlines may appear in the “wrong” category. While the AI is told where headlines should appear, the LLM may on occasion place a headline in a category that doesn’t make sense eg a headline about Seven might appear under the Agencies category instead of the Media Owner category. However, there are multiple data runs in place that minimise the risk that a headline will only appear in the wrong place. Thus, you are likely to end up with more, not less – if a headline does appear in the “wrong” category, it is most likely it will also appear in the “right” category as well. The output data also goes through regular ‘cleaning’ too, so the next refresh usually solves it.
Naming conventions for people and businesses depend on the publication. An agency like M+C Saatchi or a person like Kyle Sandilands may also be reffered in a headline as ‘M&C Saatchi’, or just as ‘Kyle’. I have processes in place to catch these, but some might inevitably slip through. Again, a refresh later in the day most often fixes it.
Some headlines may appear twice due to minor updates in the accompanying URL. Publishers will sometimes materially amend their orginally-published URL to better achieve their SEO goals. This will occasionally also mean that what looks like a duplicate entry will appear in the daily list which can mean that one of the two URLs will return a 404 error. Finally, sometimes (it happens!) two titles will publish two news items with the same headlines – it might look like a duplicate, but there are in fact two destinations.