Data clean up...

This is the place to discuss the episodes of the Comic Book Page podcast, the Comic Book Page website or pretty much anything else of interest to the Comic Book Page community...

Moderator: JohnMayo

Post Reply
User avatar
JohnMayo
Host/Owner
Posts: 3296
Joined: Mon Mar 12, 2007 3:12 pm
Location: Texas
Contact:

Data clean up...

Post by JohnMayo »

I'm in the process of completely reworking my data clean up process.

This has involved essentially starting over from scratch as many of the fundamental assumptions that I used originally no longer hold as true as they once did. Various things have been causing more and more problems with the monthly number crunching and it finally got to the point that I decided to attack the problem head on.

The approach I'm using is essentially a brute force process as there is really no other way to attempt to standardize the data coming out of Diamond. And, in fairness to Diamond, a decent part of the problem is the data going into Diamond from the publishers.

I've been able to make some major progress on a routine to take the description data from the monthly sales estimates lists, the monthly consumer order forms, the monthly cancellation lists and the weekly shipping lists and get them into a common format with a reasonably common set of titles.

One of the more challenging problems I've been working on lately is the multiple runs of a title. The data I get has no indication of which volume a given title is on so it is far too easy for my current number crunching system to combine the sales of two different comic books from two different runs of the same title because they have the same issue number.

I'm always surprised when people comment on the amount of math that must be involved in dealing with the sales estimates. While there is some math, most of it is fairly simple stuff like basic addition. The real challenge is no getting the data cleaned up so that math can be done.

The new clean up routine is far from perfect and none of this cleaner data has hit the website yet.

I welcome any questions and constructive comments on this...
Comic Book Page: Website || Podcast || RSS || Episodes Archive
Lobo
Reviewer
Posts: 131
Joined: Thu Aug 30, 2007 1:34 pm
Location: Rhode Island
Contact:

Post by Lobo »

Ghost Rider could use a clean up.

I can't tell which numbers are from the '90s series, the current ongoing series, and the Garth Ennis mini-series.
Co-host of the Kryptographik podcast, providing commentary, news,
reviews and interviews for fans of Horror, Dark Fantasy and Science Fiction.
http://www.lordshaper.com/kryptographik/
http://www.myspace.com/hellstorm_kgk
http://kryptographik.ning.com/
User avatar
JohnMayo
Host/Owner
Posts: 3296
Joined: Mon Mar 12, 2007 3:12 pm
Location: Texas
Contact:

Post by JohnMayo »

I've added Ghost Rider to list of titles to check and clean up. Hopefully I'll get some time over the holidays to work on this.
Comic Book Page: Website || Podcast || RSS || Episodes Archive
Lobo
Reviewer
Posts: 131
Joined: Thu Aug 30, 2007 1:34 pm
Location: Rhode Island
Contact:

Post by Lobo »

Thanks. Someone on the ComicMonsters.com forum was asking about how the sales of the comic were affected by the release of the movie in theaters and on DVD, and so naturally I came to the source. Image
Co-host of the Kryptographik podcast, providing commentary, news,
reviews and interviews for fans of Horror, Dark Fantasy and Science Fiction.
http://www.lordshaper.com/kryptographik/
http://www.myspace.com/hellstorm_kgk
http://kryptographik.ning.com/
Lobo
Reviewer
Posts: 131
Joined: Thu Aug 30, 2007 1:34 pm
Location: Rhode Island
Contact:

Post by Lobo »

I'm pretty sure all of these are the same company.

IDW
Idea/Design
Idea & Design Works
Idea and Design Works

Image
Co-host of the Kryptographik podcast, providing commentary, news,
reviews and interviews for fans of Horror, Dark Fantasy and Science Fiction.
http://www.lordshaper.com/kryptographik/
http://www.myspace.com/hellstorm_kgk
http://kryptographik.ning.com/
Post Reply