RLF_S1L2: The 8 Words That Unlock Everything — Core Vocabulary

1 / 1

Agent will receive reward and new observation from the envirounment after taking an action

True

False

Submit Continue to video

Rewatch

Skip

Lesson video progress:

100%

Lesson questions progress:

THE 8 CORE TERMS

Each term has a formal symbol used across all RL literature (Sutton & Barto, 2018 standard notation).

• Agent (π) — The Learner. The thing that makes decisions. Example: the CartPole policy neural network.

• Environment (ε) — The World. Everything the agent interacts with. Example: OpenAI Gym CartPole-v1.

• State (Sₜ) — A Snapshot. The current observation the agent receives. Example: [position, velocity, angle, angular velocity].

• Action (Aₜ) — What You Do. The choice the agent makes. Example: push left = 0, push right = 1.

• Reward (Rₜ) — Feedback. A scalar signal after each action. Example: +1 for every step the pole stays up.

• Policy (π(s)) — Strategy. The function that maps states to actions. Example: a neural network.

• Value (V(s)) — How Good? The expected total future reward from a state. Example: expected cumulative reward.

• Episode (G) — One Run. A complete sequence from start to termination. Example: from reset to pole falling.

ذا الفيديو يقدم شرحاً للمفاهيم الأساسية الثمانية (Vocabulary) التي تشكل حجر الأساس في مجال تعلم التعزيز (Reinforcement Learning – RL)، مستخدماً بيئة CartPole ومثال FrozenLake لتوضيح هذه المفاهيم بشكل عملي.

المفاهيم الأساسية الثمانية:

الوكيل (Agent): المتعلم الذي يتفاعل مع البيئة (0:27, 3:06).
البيئة (Environment): العالم الذي يتفاعل معه الوكيل (0:29, 3:15).
الحالة (State/Observation): لقطة أو صورة للوضع الحالي الذي يوجد فيه الوكيل (0:48, 3:24, 8:13).
الإجراء (Action): القرارات أو الحركات المتاحة للوكيل (0:34, 3:39, 7:03).
المكافأة (Reward): التغذية الراجعة التي يتلقاها الوكيل بناءً على إجرائه لتقييم أدائه (0:59, 3:48, 8:01).
السياسة (Policy): الاستراتيجية التي يتبعها الوكيل لاختيار الإجراءات بهدف تعظيم المكافآت (3:54).
القيمة (Value): مقياس إجمالي المكافآت المتوقعة لتقييم المسارات المختلفة (4:05).
الحلقة (Episode): دورة كاملة من البداية إلى النهاية (4:31).

أبرز النقاط:

طريقة التعلم: أوضح الفيديو أن التعلم في RL هو عملية تراكمية؛ يبدأ الوكيل بإجراءات عشوائية ثم يتحسن تدريجياً عبر التجربة والخطأ حتى يصل إلى السياسة المثلى (Optimal Policy) (9:57 – 10:50).
التطبيق العملي: أشار المحاضر إلى أن الكورس سيتضمن مشاريع برمجية عملية (Assignments) تشمل بيئات متنوعة مثل FrozenLake، Taxi، وCartPole لترسيخ هذه المفاهيم (8:35 – 9:44).

Video Questions

We noticed you're visiting from United Arab Emirates. We've updated our prices to United Arab Emirates dirham for your shopping convenience. Use United States (US) dollar instead. Dismiss