Tuesday, September 17, 2019

USPA.tech - the Open Source Framework to build your owned Customer Data Platform (CDP)


USPA framework - Logical Data Flow


The USPA framework is the Hybrid Open Source Software, to develop Customer Data Platform (CDP) simpler and faster. The primary target is Media Agency, Digital Company, Tech Startup, E-commerce platform, marketing scientist and software developer. I hope my work could help anyone want to build CDP in predictable way.
Another reason is my online course "Introduction to Big Data, Marketing Science , CDP and Ad Tech".

This technical architecture can be scale up 10 million daily active users (DAUs) or more, depends on how you setup it in the Cloud Providers (AWS, Google Cloud,...)

Source code : https://github.com/bigdatavietnam-org/USPA.tech


First of all, you have to build Owned Digital Media Hub first (E.g: a simple website with useful content, Social Media Channels,...)

There are 9 modules in USPA framework:

Open Source Modules:
  • (1) Data Collector: 
  • (2) Unified Data System: The module is built to collect data from multiple sources
  • (3) Data Processing System: core logic of CDP
  • (4) Database Systems
    • (4.1) Log Database: storing event log (pageview, click, play,...)
    • (4.2) Context Data Enrichment: enrich context data (IP to location, web URL to topic models) 
    • (4.3) Profile Database: storing all customer profiles
    • (4.4) Profile Data Enrichment : enrich customer data with Persona Prediction, Lifestyle, Demographics,..
  • (5) API Gateway: control the security and how another systems can pull customer data for business
  • (6) Operations Dashboard: the main tool for Digital Marker, Data Analyst and Business Manager
Closed Source Modules:

  • (7) Predictive Analytics: The data mining system for profile learning, using Machine Learning to classify and cluster data into useful segments
  • (8) Personalization System: build customer interest graph and matching product metadata with customer profile.
  • (9) Personalized Content Delivery System:  micro-target marketing with personalized content  for Customer Engagement

Friday, September 13, 2019

Customer Data Platform (CDP) - Industry 4.0

Mô hình kinh doanh lấy khách hàng làm trung tâm xuất hiện từ khi nào ?
Triết lý lấy khách hàng làm trung tâm trong marketing đã xuất hiện từ lâu, nó xuất hiện từ thời con người buôn bán ở chợ và trao đổi hàng hoá. Tuy nhiên khi xã hội  được công nghiệp hoá (Industry 2.0 và 3.0), xuất hiện mô hình sản xuất hàng loạt (mass production),  tư duy người làm marketing chuyển trọng tâm từ con người sang sản phẩm (product-first), sau đó mới tập trung làm marketing để kiếm khách hàng. Lúc này (đầu thế kỷ 20) mới xuất brand-driven marketing và trade marketing để công ty chỉ tập trung quản lý, sản xuất và kiểm soát đại lý phân phối sản phẩm.

Từ đầu thế kỷ 20, từ mô hình quản trị kinh doanh cổ điển của Peter Drucker, lấy vai trò của người làm kinh doanh như CEO chỉ là tập trung quản lý con người, nguồn vốn, phát triển thị trường và tập trung vào doanh thu. Điều này dẫn đến rất nhiều cuộc đại suy thoái và khủng hoảng kinh tế vì hàng hóa được sản xuất một cách vượt ngưỡng cung-cầu (demand supply and market equilibrium), nó thể hiện phương thức sản xuất theo cách tư duy ý chí của ban điều hành công ty (từ mong muốn chủ quan lợi nhuận trên hết, thay vì nhìn vào dữ liệu thực tế cung-cầu và mong muốn của khách hàng).
Nếu không thu thập đủ và chính xác dữ liệu về thị trường, các công ty không thể xác định chính xác trạng thái bão hòa của khi cung và cầu tiếp xúc nhau


Marketing đã cứu thế giới khỏi chiến tranh thế giới thứ 3 như thế nào ? 😀

May mắn là sau cuộc đại khủng hoảng 1933 (vốn cũng là nguồn gốc của chiến tranh thế giới thứ 2), các mô hình quản trị hiện đại đã giúp các công ty tránh đi vào vết xe đổ của cách tư duy tư bản 1.0 . Các mô hình ứng dụng data-driven marketing thế hệ đầu tiên ra đời, ứng dụng toán học thống kê, tạo ra bộ môn Market Research. Nó thể hiện rõ nhất trong  mô hình marketing theo trường phái của giáo sư Philip Kotler với triết lý marketing cơ bản trong tất cả sách giáo khoa:
Ông tin rằng marketing là một phần thiết yếu của kinh tế và thấy nhu cầu bị ảnh hưởng không chỉ bởi giá cả mà còn bởi quảng cáo, khuyến mại, lực lượng bán hàng, thư trực tiếp và nhiều người trung gian khác nhau (đại lý, nhà bán lẻ, bán buôn, v.v.) hoạt động như bán hàng và phân phối kênh truyền hình. Nguồn: https://en.wikipedia.org/wiki/Philip_Kotler
Ý tưởng mới mẻ này là thay vì chỉ nghiên cứu thị trường từ dữ liệu khách hàng, người làm marketing sẽ chủ động tạo ra mạng lưới truyền thông tiếp thị đa kênh (multi-channel marketing media network). Nghĩa là dùng truyền thông đại chúng tạo ra hiệu ứng tâm lý đám đông, từ đó ảnh hưởng sức mua nhờ  tạo ra nguồn cung mới . Điều này có được từ truyền thông đa kênh (multi-channel media or marketing mix) đã ảnh hương trực tiếp tâm lý và hành vi mua hàng. Hình ảnh sau có thể minh hoạ vài ý chính:
Từ mô hình này, người làm marketing có thể tạo market mới , dùng marketing research để sampling (chọn mẫu từ một tập customer) và thống kê , dự đoán hành vi mua hàng (sức mua). 

Industry 4.0, sự phục hưng của mô hình kinh doanh lấy khách hàng làm trung tâm và kiến tạo thị trường mới như thế nào ?
Trong cuộc cách mạng công nghiệp 4.0 (Industry 4.0), một xu hướng phục hưng customer-first business model với khách hàng là tài sản của công ty (và cả dữ liệu của họ) đang diễn ra. Đây là một xu hướng không thể đảo ngược
Những hình ảnh sau đây sẽ đáng giá nghìn lời nói 😎

Source: https://innovator.news/the-platform-economy-3c09439b56

Source: https://www.superoffice.com/blog/how-to-create-a-customer-centric-strategy/
Source: https://blog.treasuredata.com/blog/2017/03/09/what-is-multi-touch-attribution/


Kết luận cho quan điểm triết lý "Customer-first Business Model"

Với kinh tế thị trường tự do,  các tập đoàn và quốc gia sẽ kết nối với nhau qua các hiệp định thương mại tự do, họ sẽ chủ động tìm hiểu thị trường và dùng data-driven marketing để kết nối khách hàng. Điều này giảm thiểu rủi ro khủng hoảng kinh tế do dư thừa hàng hóa, khả năng xoay vốn cũng xảy ra nhanh hơn. Con người sẽ dùng dữ liệu và trái tim khách hàng làm trung tâm, thay cho dùng chiến tranh để giành đất đai và tài nguyên.
Tài nguyên quý giá nhất mà mỗi công ty có là gì ? Chính là trái tim và dữ liệu khách hàng, điều này sẽ giúp công ty trường tồn theo thời gian. 


Slide chi tiết về Customer Data Platform: 
https://drive.google.com/file/d/1RJvigcd8Ws2480ssOJaF72wOZe3qAsQv/view?usp=sharing

Monday, September 9, 2019

Data Science Full Course - Learn Data Science in 10 Hours | Data Science For Beginners




This Edureka Data Science Full Course video will help you understand and learn Data Science Algorithms in detail. This Data Science Tutorial is ideal for both beginners as well as professionals who want to master Data Science Algorithms. Below are the topics covered in this Data Science for Beginners tutorial video:
2:44 Introduction to Data Science 9:55 Data Analysis at Walmart 13:20 What is Data Science? 14:39 Who is a Data Scientist? 16:50 Data Science Skill Set 21:51 Data Science Job Roles 26:58 Data Life Cycle 30:25 Statistics & Probability 34:31 Categories of Data 34:50 Qualitative Data 36:09 Quantitative Data 39:11 What is Statistics? 41:32 Basic Terminologies in Statistics 42:50 Sampling Techniques 45:31 Random Sampling 46:20 Systematic Sampling 46:50 Stratified Sampling 47:54 Types of Statistics 50:38 Descriptive Statistics 55:52 Measures of Spread 55:56 Range 56:44 Inter Quartile Range 58:58 Variance 59:36 Standard Deviation 1:14:25 Confusion Matrix 1:19:16 Probability 1:24:14 What is Probability? 1:27:13 Types of Events 1:27:58 Probability Distribution 1:28:15 Probability Density Function 1:30:02 Normal Distribution 1:30:51 Standard Deviation & Curve 1:31:19 Central Limit Theorem 1:33:12 Types of Probablity 1:33:34 Marginal Probablity 1:34:06 Joint Probablity 1:34:58 Conditional Probablity 1:35:56 Use-Case 1:39:46 Bayes Theorem 1:45:44 Inferential Statistics 1:56:40 Hypothesis Testing 2:00:34 Basics of Machine Learning 2:01:41 Need for Machine Learning 2:07:03 What is Machine Learning? 2:09:21 Machine Learning Definitions 2:!1:48 Machine Learning Process 2:18:31 Supervised Learning Algorithm 2:19:54 What is Regression? 2:21:23 Linear vs Logistic Regression 2:33:51 Linear Regression 2:25:27 Where is Linear Regression used? 2:27:11 Understanding Linear Regression 2:37:00 What is R-Square? 2:46:35 Logistic Regression 2:51:22 Logistic Regression Curve 2:53:02 Logistic Regression Equation 2:56:21 Logistic Regression Use-Cases 2:58:23 Demo 3:00:57 Implement Logistic Regression 3:02:33 Import Libraries 3:05:28 Analyzing Data 3:11:52 Data Wrangling 3:23:54 Train & Test Data 3:20:44 Implement Logistic Regression 3:31:04 SUV Data Analysis 3:38:44 Decision Trees 3:39:50 What is Classification? 3:42:27 Types of Classification 3:42:27 Decision Tree 3:43:51 Random Forest 3:45:06 Naive Bayes 3:47:12 KNN 3:49:02 What is Decision Tree? 3:55:15 Decision Tree Terminologies 3:56:51 CART Algorithm 3:58:50 Entropy 4:00:15 What is Entropy? 4:23:52 Random Forest 4:27:29 Types of Classifier 4:31:17 Why Random Forest? 4:39:14 What is Random Forest? 4:51:26 How Random Forest Works? 4:51:36 Random Forest Algorithm 5:04:23 K Nearest Neighbour 5:05:33 What is KNN Algorithm? 5:08:50 KNN Algorithm Working 5:14:55 kNN Example 5:24:30 What is Naive Bayes? 5:25:13 Bayes Theorem 5:27:48 Bayes Theorem Proof 5:29:43 Naive Bayes Working 5:39:06 Types of Naive Bayes 5:53:37 Support Vector Machine 5:57:40 What is SVM? 5:59:46 How does SVM work? 6:03:00 Introduction to Non-Linear SVM 6:04:48 SVM Example 6:06:12 Unsupervised Learning Algorithms - KMeans 6:06:18 What is Unsupervised Learning? 6:06:45 Unsupervised Learning: Process Flow 6:07:17 What is Clustering? 6:09:15 Types of Clustering 6:10:15 K-Means Clustering 6:10:40 K-Means Algorithm Working 6:16:17 K-Means Algorithm 6:19:16 Fuzzy C-Means Clustering 6:21:22 Hierarchical Clustering 6:22:53 Association Clustering 6:24:57 Association Rule Mining 6:30:35 Apriori Algorithm 6:37:45 Apriori Demo 6:40:49 What is Reinforcement Learning? 6:42:48 Reinforcement Learning Process 6:51:10 Markov Decision Process 6:54:53 Understanding Q - Learning 7:13:12 Q-Learning Demo 7:25:34 The Bellman Equation 7:48:39 What is Deep Learning? 7:52:53 Why we need Artificial Neuron? 7:54:33 Perceptron Learning Algorithm 7:57:57 Activation Function 8:03:14 Single Layer Perceptron 8:04:04 What is Tensorflow? 8:07:25 Demo 8:21:03 What is a Computational Graph? 8:49:18 Limitations of Single Layer Perceptron 8:50:08 Multi-Layer Perceptron 8:51:24 What is Backpropagation? 8:52:26 Backpropagation Learning Algorithm 8:59:31 Multi-layer Perceptron Demo 9:01:23 Data Science Interview Questions

Data Analytics vs. Data Science

Forget about viewing it as data science vs. data analytics. Instead, we should see them as parts of a whole that are vital to understanding how to better analyze and review data.

Big data has become a major component in the tech world today thanks to the actionable insights and results businesses can glean. However, the creation of such large datasets also requires understanding and having the proper tools on hand to parse through them to uncover the right information. To better comprehend big data, the fields of data science and analytics have gone from largely being relegated to academia, to instead becoming integral elements of business intelligence and big data analytics tools.
However, it can be confusing to differentiate between data analytics and data science. Despite the two being interconnected, they provide different results and pursue different approaches. If you need to study data your business is producing, it's vital to grasp what they bring to the table and how each is unique. To help you optimize your big data analytics, we break down both categories, examine their differences, and reveal the value they deliver.

What Is Data Science?

Data science is a multidisciplinary field focused on finding actionable insights from large sets of raw and structured data. The field primarily fixates on unearthing answers to the things we don't know we don't know. Data science experts use several different techniques to obtain answers, incorporating computer science, predictive analytics, statistics, and machine learning to parse through massive data sets in an effort to establish solutions to problems that haven't been thought of yet.
Data scientists' main goal is to ask questions and locate potential avenues of study, with less concern for specific answers and more emphasis placed on finding the right question to ask. Experts accomplish this by predicting potential trends, exploring disparate and disconnected data sources, and finding better ways to analyze information.

What Is Data Analytics?

Data analytics focuses on processing and performing statistical analysis on existing datasets. Analysts concentrate on creating methods to capture, process, and organize data to uncover actionable insights for current problems, and establishing the best way to present this data. More simply, the field of data analytics is directed towards solving problems for questions we know we don't know the answers to. More importantly, it's based on producing results that can lead to immediate improvements.
Data analytics also encompasses a few different branches of broader statistics and analysis which help combine diverse sources of data and locate connections while simplifying the results.

What Is the Difference?

While many people use the terms interchangeably, data science and big data analytics are unique fields, with the major difference being the scope. Data science is an umbrella term for a group of fields that are used to mine large data sets. Data analytics is a more focused version of this and can even be considered part of the larger process. Analytics is devoted to realizing actionable insights that can be applied immediately based on existing queries.
Another significant difference in the two fields is a question of exploration. Data science isn't concerned with answering specific queries, instead parsing through massive data sets in sometimes unstructured ways to expose insights. Data analysis works better when it is focused, having questions in mind that need answers based on existing data. Data science produces broader insights that concentrate on which questions should be asked, while big data analytics emphasizes discovering answers to questions being asked.
More importantly, data science is more concerned about asking questions than finding specific answers. The field is focused on establishing potential trends based on existing data, as well as realizing better ways to analyze and model data.
Image title
Image result for Data Analytics vs. Data Science
The two fields can be considered different sides of the same coin, and their functions are highly interconnected. Data science lays important foundations and parses big data sets to create initial observations, future trends, and potential insights that can be important. This information by itself is useful for some fields especially modeling, improving machine learning, and enhancing AI algorithms as it can improve how information is sorted and understood. However, data science asks important questions that we were unaware of before while providing little in the way of hard answers. By adding data analytics into the mix, we can turn those things we know we don't know into actionable insights with practical applications.
When thinking of these two disciplines, it's important to forget about viewing them as data science vs. data analytics. Instead, we should see them as parts of a whole that are vital to understanding not just the information we have, but how to better analyze and review it.

Is data analytics and data science related?

There are many differences between them if you will see in a concentrated way.
Now, let’s see the differences between Data Science and Data Analytics.
Start with the definitions:
Data Scientist vs Data Analyst according to Definition
  • A Data Scientist role is to predict future based on past patterns. While Data analyst finds meaningful information from data.
  • The role of Data scientist is to generate its own question. But Data analyst finds the answers to others sets of questions.
  • As Data scientists have the what ifs. But Data analysts are the ones who do the day-to-day analysis
  • Data scientist addresses business problems. It also gives an accurate prediction of the value of business once solved. Whereas Data Analyst only address business problems
  • Data scientist uses machine learning for extracting information. But Data Analyst uses an R / SAS tool for extracting information.
  • The role of Data scientist is to explore and examines information. He explores information from many disconnected sources. But Data Analyst explores and examines data from a single source.
  • The prediction of Data Scientist is very high. It can be accurate up to 90%. But, Data analysts don’t predict. They only solve the question given by the business.
  • A Data scientists will formulate questions. They formulate those questions whose solutions are likely to benefit the business. But Data Analyst only solves the questions given by business.
  • A Data scientist must have sound knowledge in statistical models and machine learning. Data Analyst needs sound knowledge in SAS/R
Data Analyst vs Data Scientist according to Responsibilities
a) A Data Scientist Responsibilities
  • Data cleansing and processing.
  • Prediction of the business problem. His roles are to give future results of that business.
  • Develop machine learning models and analytical methods.
  • Find new business questions that can then add value to the business.
  • Data mining using state-of-the-art methods.
  • Presenting results in a clear manner and doing the ad-hoc analysis.
b) Data Analyst Responsibilities
  • Identify any data quality issues in data acquisition.
  • Solving business problems. By mapping and then tracing the data.
  • A Data analyst should coordinate with engineers to gather new data.
  • Perform statistical analysis of business data.
  • Documenting the types and structure of the business data.
4.3. Data Analyst vs Data Scientist roles based on skill sets
a) Data Scientist roles according to their skill sets
  • The Data creatives
  • Data Developers
  • Data Researchers
  • The Data Businesspeople
b) Data Analyst roles according to their skill sets
  • Database Administrators
  • Operations
  • The Data Architects
  • A Data Analysts
4.4. Data Scientist vs Data Analyst – Salary
Below statistics shows the salary of Data Scientist vs Data Analyst-

Featured Post

Big Data : Nhu cầu thị trường - Định hướng nghề nghiệp - Tính chất công việc

Tập hợp một số câu hỏi từ bạn trên page https://www.facebook.com/bigdatavn  và từ các buổi thuyết trình ở Barcamp 1. Làm việc trong ngàn...