RLF_S1L1: When You Can’t Write the Rules — Introduction to RL – مقدمة التعليم المعزز

Lesson video progress:

100%

CONCEPT 1 — THE PROBLEM RL SOLVES

There are three ways computers learn: rules, examples, and feedback.
Sorting: humans already know the algorithm — we simply write it as code.
Cat recognition: we cannot explicitly write the rules, but we have millions of labelled examples to train on.
Walking upstairs: even experts in biomechanics cannot fully describe how it works — there are no clear rules and no labelled dataset.

Reinforcement Learning (RL) is the third approach: instead of programming the solution directly, we program a system that learns how to discover the solution by itself.

The three learning approaches are compared side by side to illustrate where RL fits. The example of robot walking highlights why traditional machine learning methods struggle with such problems.

CONCEPT 2 — HOW RL LEARNS (TRIAL, ERROR, FEEDBACK)

An RL agent learns similarly to how a baby learns to walk: try, fail, adjust, and repeat.
At the beginning, the agent makes random decisions and performs poorly.
Over time, it starts making slightly better decisions, survives longer, and improves gradually.
After many repetitions, the agent develops an effective strategy and becomes highly successful at the task.

Each failure teaches the agent which actions lead to poor outcomes, while each success strengthens behaviors that produce better results.

A key lesson in RL is that learning happens gradually through repeated interaction and feedback.

CONCEPT 3 — RL IS ALREADY EVERYWHERE

Reinforcement Learning is already used in many real-world applications, including:

ChatGPT: improved through Reinforcement Learning from Human Feedback (RLHF).
Self-driving vehicles: helping systems make driving decisions such as lane changes.
Recommendation systems: suggesting movies, products, or content based on user behavior.
Warehouse automation: optimizing robot movement and logistics.
Gaming AI: training agents to master complex games.

Many of these applications have become practical only in recent years due to advances in AI and computing power.

CONCEPT 4 — THREE PARADIGMS COMPARED

Supervised Learning: “Here is the correct answer.” Requires labelled training data.
Unsupervised Learning: “Find hidden patterns.” Uses unlabelled data.
Reinforcement Learning: “Try actions and learn what works.” Requires an environment and feedback.

RL is unique because the agent’s own actions influence the experiences and data it learns from, making it fundamentally different from other machine learning approaches.

المفهوم الأول — ما المشكلة التي يحلها التعلم المعزز (RL)؟

توجد ثلاث طرق رئيسية لتعلّم الحاسوب: القواعد، والأمثلة، والتغذية الراجعة.
الفرز (Sorting): يعرف الإنسان الخوارزمية مسبقًا، لذلك نقوم فقط بكتابتها برمجيًا.
التعرّف على القطط: لا يمكننا كتابة القواعد بدقة، ولكن لدينا ملايين الأمثلة المصنفة (Labelled Data) للتدريب.
صعود الدرج أو المشي: حتى الخبراء في علم الحركة لا يستطيعون وصف العملية بالكامل — لا توجد قواعد واضحة ولا بيانات تدريب جاهزة.

هنا يأتي دور التعلم المعزز (Reinforcement Learning – RL): بدلًا من برمجة الحل مباشرة، نقوم ببرمجة طريقة تجعل النظام يكتشف الحل بنفسه.

يتم مقارنة طرق التعلم الثلاث لفهم مكانة RL بينها، كما يتم استخدام مثال المشي الآلي لإظهار سبب صعوبة المشكلة على تقنيات التعلم التقليدية.

المفهوم الثاني — كيف يتعلم RL؟ (المحاولة، الخطأ، والتغذية الراجعة)

يتعلم وكيل التعلم المعزز بالطريقة نفسها التي يتعلم بها الطفل المشي: يجرب، يفشل، يعدّل، ثم يعيد المحاولة.
في البداية، تكون القرارات عشوائية والنتائج ضعيفة.
مع مرور الوقت، يبدأ النظام في التحسن تدريجيًا ويحقق نتائج أفضل.
بعد عدد كبير من المحاولات، يطوّر الوكيل استراتيجية فعالة ويصبح قادرًا على أداء المهمة بكفاءة عالية.

كل فشل يعلّم الوكيل ما هي الأفعال التي تؤدي إلى نتائج سيئة، بينما تعزز النجاحات السلوكيات الصحيحة.

الفكرة الأساسية هنا هي أن التحسن في التعلم المعزز يحدث تدريجيًا من خلال التجربة المستمرة والتغذية الراجعة.

المفهوم الثالث — التعلم المعزز موجود بالفعل في كل مكان

يُستخدم التعلم المعزز اليوم في العديد من التطبيقات الواقعية، مثل:

ChatGPT: تم تحسينه باستخدام التعلم المعزز من التغذية الراجعة البشرية (RLHF).
السيارات ذاتية القيادة: لاتخاذ قرارات مثل تغيير المسارات.
أنظمة التوصية: مثل اقتراح الأفلام أو المنتجات المناسبة.
روبوتات المستودعات: لتحسين الحركة وإدارة العمليات اللوجستية.
ألعاب الذكاء الاصطناعي: لتدريب الأنظمة على إتقان الألعاب المعقدة.

الكثير من هذه التطبيقات أصبحت ممكنة عمليًا فقط خلال السنوات الأخيرة بفضل التطور الكبير في الذكاء الاصطناعي والقدرات الحاسوبية.

المفهوم الرابع — مقارنة بين ثلاثة أنماط للتعلم

التعلم الخاضع للإشراف (Supervised Learning):
“إليك الإجابة الصحيحة.”
يحتاج إلى بيانات تدريب مصنفة.
التعلم غير الخاضع للإشراف (Unsupervised Learning):
“اكتشف الأنماط بنفسك.”
يعتمد على بيانات غير مصنفة.
التعلم المعزز (Reinforcement Learning):
“جرّب وتعلّم ما الذي ينجح.”
يحتاج فقط إلى بيئة وتجربة وتغذية راجعة.

يتميز RL بأنه النمط الوحيد الذي تؤثر فيه قرارات الوكيل نفسها على البيانات والخبرات التي سيتعلم منها لاحقًا، مما يجعله مختلفًا جذريًا عن أساليب التعلم الأخرى.

We noticed you're visiting from United Arab Emirates. We've updated our prices to United Arab Emirates dirham for your shopping convenience. Use United States (US) dollar instead. Dismiss