A futuristic data analytics workspace with multiple monitors displaying abstract data visualizations, Redshift database insights, and counterfeit detection tracking. A data analyst is working on a complex query with legal documents open beside them. A large digital map of the USA highlights a specific state under review. Holographic screens showcase shipment timelines and vendor details, emphasizing cybersecurity and data accuracy.

Counterfeit Detection and Legal Compliance: Optimizing Data Analytics with Redshift

Categories

Sign up for our newsletter

We care about the protection of your data. Read our Privacy Policy.

Information

The Challenge

Technical Implementation

Identify all retail vendors and ASINs (unique item IDs) that received a counterfeit complaint in the past four years.

During that timeframe, complaints were manually kept in an Excel sheet only. Deduplicate all vendor & ASIN combinations and create a static table to join with other data tables.

A prior analyst had created a bare-bones query. Upon review, the query was using incorrect tables. When an order is placed, an item gets assigned for accounting and inventory management purposes. However, the warehouse may pick the same item from a bin that does not necessarily belong to the assigned shipment or vendor. Therefore, the query had to be completely rewritten. Legal was informed that the previous rough estimates might not be correct.

Our systems were not connected to the inventory receiving system, making it impossible to track all items from the same shipment. I met with the inventory analyst, and together we worked out the average number of days it takes to sell out a batch by vendor to continue the analysis. The average was 60 days, but it was still impossible to determine whether the counterfeit complaint was received at the beginning of the shipment or after all items from that shipment had already been sold.

Our Redshift cluster did not store data that far back, so the data had to be queried directly from the Data Lake. However, the available tool was not designed to handle such complex queries directly from the Data Lake. I had to aggregate the data manually and load it into Redshift to continue the analysis.

Since I did not know the inventory batch volume or the exact timeline of when each item from a specific batch was sold, I presented our legal team with the following three approaches:

i. All items shipped to that state 60 days prior to the complaint date.

ii. All items shipped to that state within a dynamic date range: 30 days before and 30 days after the complaint date.

iii. All items shipped to that state 90 days before vendor enforcement.

Another hurdle was drop shipments, which included all items sent directly from the vendor to the customer. These types of shipments receive different vendor codes, which had to be manually reviewed and added to the static table created with an additional column differentiating regular orders from drop shipments.

Finalize the query, summarize the output, and send it to legal along with all pros and cons of each approach. Additionally, document all steps taken to complete this analysis in case it needs to be replicated.

Results

Subscribe to our newsletter

Stay informed with the latest insights, industry trends, and expert tips delivered straight to your inbox. Sign up for our newsletter today and never miss an update!

We care about the protection of your data. Read our Privacy Policy.

Keep reading

Dig deeper into data development by browsing our blogs…

Get in Touch

Let us leverage your data so that you can make smarter decisions. Talk to our team of data experts today or fill in this form and we’ll be in touch.