𝗣𝗿𝗶𝗻𝗰𝗶𝗽𝗮𝗹 𝗖𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁 𝗔𝗻𝗮𝗹𝘆𝘀𝗶𝘀 (𝗣𝗖𝗔)
𝗧𝗵𝗲 𝗔𝗿𝘁 𝗼𝗳 𝗥𝗲𝗱𝘂𝗰𝗶𝗻𝗴 𝗗𝗶𝗺𝗲𝗻𝘀𝗶𝗼𝗻𝘀 𝗪𝗶𝘁𝗵𝗼𝘂𝘁 𝗟𝗼𝘀𝗶𝗻𝗴 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀
𝗪𝗵𝗮𝘁 𝗘𝘅𝗮𝗰𝘁𝗹𝘆 𝗜𝘀 𝗣𝗖𝗔?
⤷ 𝗣𝗖𝗔 is a 𝗺𝗮𝘁𝗵𝗲𝗺𝗮𝘁𝗶𝗰𝗮𝗹 𝘁𝗲𝗰𝗵𝗻𝗶𝗾𝘂𝗲 used to transform a 𝗵𝗶𝗴𝗵-𝗱𝗶𝗺𝗲𝗻𝘀𝗶𝗼𝗻𝗮𝗹 dataset into fewer dimensions, while retaining as much 𝘃𝗮𝗿𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 (𝗶𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻) as possible.
⤷ Think of it as “𝗰𝗼𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝗻𝗴” data, similar to how we reduce the size of an image without losing too much detail.
𝗪𝗵𝘆 𝗨𝘀𝗲 𝗣𝗖𝗔 𝗶𝗻 𝗬𝗼𝘂𝗿 𝗣𝗿𝗼𝗷𝗲𝗰𝘁𝘀?
⤷ 𝗦𝗶𝗺𝗽𝗹𝗶𝗳𝘆 your data for 𝗲𝗮𝘀𝗶𝗲𝗿 𝗮𝗻𝗮𝗹𝘆𝘀𝗶𝘀 and 𝗺𝗼𝗱𝗲𝗹𝗶𝗻𝗴
⤷ 𝗘𝗻𝗵𝗮𝗻𝗰𝗲 machine learning models by reducing 𝗰𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗰𝗼𝘀𝘁
⤷ 𝗩𝗶𝘀𝘂𝗮𝗹𝗶𝘇𝗲 multi-dimensional data in 2𝗗 or 3𝗗 for insights
⤷ 𝗙𝗶𝗹𝘁𝗲𝗿 𝗼𝘂𝘁 𝗻𝗼𝗶𝘀𝗲 and uncover hidden patterns in your data
𝗧𝗵𝗲 𝗣𝗼𝘄𝗲𝗿 𝗼𝗳 𝗣𝗿𝗶𝗻𝗰𝗶𝗽𝗮𝗹 𝗖𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁𝘀
⤷ The 𝗳𝗶𝗿𝘀𝘁 𝗽𝗿𝗶𝗻𝗰𝗶𝗽𝗮𝗹 𝗰𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁 is the direction in which the data varies the most.
⤷ Each subsequent component represents the 𝗻𝗲𝘅𝘁 𝗵𝗶𝗴𝗵𝗲𝘀𝘁 𝗿𝗮𝘁𝗲 of variance, but is 𝗼𝗿𝘁𝗵𝗼𝗴𝗼𝗻𝗮𝗹 (𝘂𝗻𝗰𝗼𝗿𝗿𝗲𝗹𝗮𝘁𝗲𝗱) to the previous one.
⤷ The challenge is selecting how many components to keep based on the 𝘃𝗮𝗿𝗶𝗮𝗻𝗰𝗲 they explain.
𝗣𝗿𝗮𝗰𝘁𝗶𝗰𝗮𝗹 𝗘𝘅𝗮𝗺𝗽𝗹𝗲
1: 𝗖𝘂𝘀𝘁𝗼𝗺𝗲𝗿 𝗦𝗲𝗴𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻
Imagine you’re working on a project to 𝘀𝗲𝗴𝗺𝗲𝗻𝘁 customers for a marketing campaign, with data on spending habits, age, income, and location.
⤷ Using 𝗣𝗖𝗔, you can reduce these four variables into just 𝘁𝘄𝗼 𝗽𝗿𝗶𝗻𝗰𝗶𝗽𝗮𝗹 𝗰𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁𝘀 that retain 90% of the variance.
⤷ These two new components can then be used for 𝗸-𝗺𝗲𝗮𝗻𝘀 clustering to identify distinct customer groups without dealing with the complexity of all the original variables.
𝗧𝗵𝗲 𝗣𝗖𝗔 𝗣𝗿𝗼𝗰𝗲𝘀𝘀 — 𝗦𝘁𝗲𝗽-𝗕𝘆-𝗦𝘁𝗲𝗽
⤷ 𝗦𝘁𝗲𝗽 𝟭: 𝗗𝗮𝘁𝗮 𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗶𝘇𝗮𝘁𝗶𝗼𝗻
Ensure your data is on the same scale (e.g., mean = 0, variance = 1).
⤷ 𝗦𝘁𝗲𝗽 𝟮: 𝗖𝗼𝘃𝗮𝗿𝗶𝗮𝗻𝗰𝗲 𝗠𝗮𝘁𝗿𝗶𝘅
Calculate how features are correlated.
⤷ 𝗦𝘁𝗲𝗽 𝟯: 𝗘𝗶𝗴𝗲𝗻 𝗗𝗲𝗰𝗼𝗺𝗽𝗼𝘀𝗶𝘁𝗶𝗼𝗻
Compute the eigenvectors and eigenvalues to determine the principal components.
⤷ 𝗦𝘁𝗲𝗽 𝟰: 𝗦𝗲𝗹𝗲𝗰𝘁 𝗖𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁𝘀
Choose the top-k components based on the explained variance ratio.
⤷ 𝗦𝘁𝗲𝗽 𝟱: 𝗗𝗮𝘁𝗮 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻
Transform your data onto the new 𝗣𝗖𝗔 space with fewer dimensions.
𝗪𝗵𝗲𝗻 𝗡𝗼𝘁 𝘁𝗼 𝗨𝘀𝗲 𝗣𝗖𝗔
⤷ 𝗣𝗖𝗔 is not suitable when the dataset contains 𝗻𝗼𝗻-𝗹𝗶𝗻𝗲𝗮𝗿 𝗿𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝘀𝗵𝗶𝗽𝘀 or 𝗵𝗶𝗴𝗵𝗹𝘆 𝘀𝗸𝗲𝘄𝗲𝗱 𝗱𝗮𝘁𝗮.
⤷ For non-linear data, consider 𝗧-𝗦𝗡𝗘 or 𝗮𝘂𝘁𝗼𝗲𝗻𝗰𝗼𝗱𝗲𝗿𝘀 instead.
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
>>Click here to continue<<