Looking for the best way to store and manage tags in the HPR database, part 3
Hosted by Dave Morriss on 2017-04-14 is flagged as Explicit and is released under a CC-BY-SA license.
Listen in ogg, spx, or mp3 format. | Comments (8)
This is the third (and last) show looking at the subject of Managing Tags relating to HPR shows.
In the first show we looked at why we need tags, and examined the advantages and disadvantages of the present system of storage. We considered the drawbacks of this design when searching the tags.
Then in the second show we looked at a simple way of making a tags table and how to query it in order to fulfil the requirements defined in the first show.
In this show we’ll look at a more rigorous, efficient, “normalised” solution.
I have written out a set of longer notes for this episode and these are available here.
Subscribe to the comments RSS feed.
Comment #1 posted on 2017-04-19T14:42:13Z by Steve
Make it so
As someone who is also not formally trained in database administration but nonetheless does quite a bit of database administration and development, what you have said and the conclusions you have drawn sound exactly right to me. I say, make it so.
Comment #2 posted on 2017-04-19T15:49:44Z by Dave Morriss
I appreciate the comment.
I'm currently looking into how we can incorporate such features into the database - and modify all of the code around it.
Comment #3 posted on 2017-04-19T22:03:58Z by gws
A series is the same thing as a tag, if you need to distinguish them put another column on the Tag table. The join across Episode Tag is the same.
Comment #4 posted on 2017-04-19T22:28:40Z by Dave Morriss
Series same as Tag?
I think you have a point - except that the Series idea was originally designed to have two other significant attributes: a description and a public/private flag.
The description is an arbitrarily long text field, used to store HTML which is displayed in the web page for shows that are part of the series.
The public/private flag denotes whether the series is open to more contributors or not. Most modern series are public but some historical ones are private.
Changing the Tag table to include these attributes, to be used for "series" tags, is not impossible of course. It needs some thought.
Thanks for the suggestion.
Comment #5 posted on 2017-04-21T01:40:09Z by gws
tag vs. series
Variable-length columns like VARCHAR or CLOB should not balloon the size of your Tag table just by adding them; assuming even a moderately sane dbms those large and sparse objects would be stored in separate data structure (thing 'string pool') so you pay for what you use.
BTW my earlier comment was meant to say "Episode (left arrow) EpTag (right arrow) Tag" but I used angle brackets and the middle bit got swallowed by HTML.
Comment #6 posted on 2017-04-23T22:37:23Z by Brenda J. Butler
I'm new to HPR, sorry if I make comments that show ignorance of how you do things. Please point me to resources, I'll be happy to read up.
I'm not a big DB expert either, but like you have used some DBs and have a little experience. Also a bit of experience making a couple of database-backed web sites.
I like the idea of the third design of tags.
I would also change the HPR episode intake process to make any new tags in the new format - have a cutover date/time after which all the new entries in the database use the new tagging scheme (populate the new tags tables and not the old tags fields. In fact, even remove the old tags fields to avoid confusion about which set of tags is the "right" set). That way you only have to do that "populate the new tag fields from the old tag fields" step once, at cutover time. You could keep a copy of the old site (and update it) for a while until confident the new site works properly.
I don't know how the HPR web site is served, I got the impression from your series that it is static pages generated from a DB. Perhaps you generate a new set of pages when a new episode is added to the DB. I think you cannot go this way if you want to use that query you developed, "what other shows have at least one of the tags that this show has". Or at least, it will be difficult to implement.
Can I read somewhere about the way the web site is served, the tech stack, etc? Is there a public repo for the code (read-only acceptable)?
Thanks for all your great, extensive show notes! Really appreciated.
Comment #7 posted on 2017-04-24T22:00:52Z by Dave Morriss
Tags and Series
Thanks gws. It wasn't so much the storage issue I was referring to, more the logic of the suggested change. I do like what you're proposing though.
Sorry you got bitten by this crappy comment system.
Comment #8 posted on 2017-04-24T22:22:00Z by Dave Morriss
Thanks for your comments.
There is a GitLab instance with a repository which holds some of the public code:
Your suggestion for the transition from the old to a new tag system is pretty much what I had in mind, but we haven't yet discussed all the issues amongst the Admins.
The site is not static, though there have been discussions about making it so. I take your point that there's a conflict between having a static site and offering tag query features though.