BBBloviations: December 2013

Saturday, December 14, 2013

College and career readiness - what?

Who wouldn't think college and career readiness is a good thing? Of course students should be prepared for college and career. But what does this mean? And why are we talking about college and career readiness as if they are the same things?

First, let's consider college readiness. Is this the same thing for all colleges? How about for all majors even at the same university? Does making sure all students can simplify radical expressions and solve quadratic equations mean they are all ready for every college program in every college? What about technical college? Business college? Do all colleges have the same criteria for admission? How can we have one set of standards to prepare all students for college when the requirements and expectations for different universities and different programs vary so greatly?

Now think about career readiness. It's even more mind-boggling to think about preparing students for all careers with the same set of standards. Seems to me we're hammering a lot of square pegs into round holes. How are we preparing the photographers, the artists, the dancers, the musicians, the athletes, the carpenters, the builders, the public servants when we expect everyone to meet the same expectations geared primarily to students with strong verbal aptitude?

Nearly everything in the common core is geared to verbal skill and reasoning. Reading complex text, writing to text, citing evidence from text. How about the careers that lean more to concrete, visual-spatial intelligence? Social-emotional intelligence? Musical intelligence? Students are not standardized. Why should their education be?

And another thing - why do we spend so much time identifying and quantifying student weaknesses, focusing on what they can't do? Is there room to also find what they can do? Can we encourage them to grow and develop other skills and knowledge besides the ones in the standards?

I'm all for preparing students for college and careers. I just happen to think that the best way to do this is to encourage students to discover and follow their passions. To become lifelong learners. To be equipped as confident, competent citizens. To take risks. To spread their wings. I am not sure we can do that when we use the same standards for every child.

I don't have the answers. Just lots of questions. But perhaps we need to have some dialogue about where we're going with our obsession with standards and high stakes testing before we lose an entire generation of children.

What the FIP?

Student academic growth is a hot topic these days. Many states are using student growth measures to evaluate teachers. As I examine data with educators, one recurring question is "What can I do to impact student growth??" Sometimes this query is a manner of professional curiosity, but sometimes, it is accompanied by anxiety, frustration, and even panic, since 50% of a teacher's evaluation is based on this metric.

My usual answer is: FIP.

What is FIP, you wonder? Glad you asked. FIP stands for Formative Instructional Practices and it's hands-down the best way to improve teaching and learning that I've seen for a long time. In fact, FIP isn't new. It combines "unpacking the standards," developing clear learning targets, appropriate use of feedback, collecting and using data to inform instruction, and the best of formative assessments all in one nicely organized initiative.

Much of the training and work for FIP comes from Battelle for Kids (BfK), a non-profit organization headquartered in Columbus, Ohio with a known track record for excellent, research-based educational products. BfK has developed a series of online modules to guide teachers as they learn about and implement FIP in their classrooms. FIP schools have found that, not only can they improve their student growth scores, but they notice increased student engagement, self-confidence, and student ownership of their own learning.

I'm a firm believer that students will not take responsibility for their own learning until we give them responsibility. FIP modules give practical strategies teachers can use to increase student ownership. BfK also has several subject-specific modules in which teachers can see what actual procedures can be set up to manage this system.

My personal experience with FIP comes to me through my grandson - I'll call him Joey (not his real name or he would kill me). He moved to a FIP school his 8th grade year. Prior to his enrollment at Adams Middle School in Johnstown, Ohio, Joey hadn't done so well in school. He suffered from all kinds of medical problems that impacted his hearing and language development as a young child, which has an impact on his learning to this day. Adams Middle School is a FIP school. All of the teachers understand and implement FIP. The school schedule is structured to allow students extra time to study and reassess if they don't achieve mastery on their first summative assessment. The overwhelming sense of anyone entering the building is that all professionals are involved in helping each student learn. The amazing thing to me as a grandma is that Joey can now tell me exactly what he needs to study for tests. He can articulate what he is learning in class. And, best of all, Joey is a successful student with confidence to take risks and learn new things. Oh, Joey still fails tests the first time he takes them sometimes, but he sticks with it, studies, and ends up mastering the material so that he can get B's and C's....and even a few A's in his classes. He no longer feels stupid - and for a grandma, that's priceless.

No, I am not an employee of BfK, nor do I receive any benefit from plugging their work. I'm just an educator interested in what works and a grandma who loves learning.

Principal as Zombie

Watching a zombie movie trailer the other day, it struck me how similar the walking dead appear to the principals with whom I work in central Ohio. I'm regularly hearing comments like, "It used to be more fun than this." or "I miss how I used to be able to interact with my staff and students."

Ohio has implemented a new evaluation system called the Ohio Teacher Evaluation System (OTES). There are two parts to this evaluation: 50% is based on student growth measures and 50% is based on a teacher performance rubric. The rubric is well-defined and based on known best practice. I have much hope that it could be a tool that stimulates professional discussion and facilitates professional growth. It would, that is, if we had time to actually do it.

Principals must manage a building of hundreds of students, supervise dozens of staff members, and interact with parents on a regular basis. They attend ball games, plays, concerts, and Board meetings after hours. On top of this already full agenda, the state of Ohio has added nearly 200 additional hours of work to every principal (and many, many more hours than this to some) by requiring that every teacher be evaluated every year.

Teacher performance does not drastically change from one year to the next. Requiring a full-blown evaluation with multiple observations, conferences, and "walk-throughs" each year for every teacher is overkill, causing principals to be so overburdened that they can't take time to work the process in an effective manner. Requiring evaluations every 2-3 years would allow for more time to actually make the process work.

Senate Bill 229, now in the Ohio House, would be a step toward setting up a system that could be used to improve teaching and learning. In this bill, principals would only evaluate skilled teachers every other year and accomplished teachers every three years. Teachers who are developing or ineffective would still be evaluated annually. As educators, we want to do our very best. Reducing the number of teachers that must be evaluated each year would make the task more manageable. And maybe my principal friends wouldn't look so much like zombies.

When growth is not growth

Unless you're living in a cave, you know that teachers in many states are being evaluated and rated based on the scores their students achieve on standardized tests. I have written about why this is a bad idea here, here, here, and here. Today, I'm going to explain yet another part of the system that is patently unfair to teachers.

Once upon a time, there was an excellent teacher in an excellent school district who taught excellent students. In fact, the students were so excellent that they achieved far higher than most other students in the state in every way. They competed in tests of scholastic aptitude, they excelled in debate and music, and of course, they always scored in the very top of statewide standardized tests.

This year, the excellent teacher in in the excellent school had seven of these excellent students in her excellent classroom. When the standardized test were given, she was quite confident that her students would do well, even though three of them had been out late the night before at a concert. When the results were released, however, it was found that four of the excellent students had excellent scores that were the same or a little better than the scores they had the previous year. Unfortunately, they had scores so near the top in the past that their total gain was very small. One can't score better than 100%, after all, and all of the students had begun at scores of 93-99.

What's really unfortunate, however, is that the three excellent students who always did excellent work and had excellent academic accomplishments didn't have such excellent scores on the state tests administered the day after the concert. In fact, all three of them had scores that dropped anywhere from 12-28 points on the bell curve. When the State averaged net gain for this group of excellent students, they found that the average change in growth was negative 8.

The excellent teacher in the excellent school with the excellent students was devastated because, based on this data, the district was determined to be a FAILURE when it comes to student growth with their gifted and talented students.

This story is based on a real situation in a real school district in Ohio. And herein lies yet another problem with using Value Added Measures in determining teacher effectiveness. Average is not always the best representation of a set of data. Let me give you another example. Let's say 100 teachers are in a room, and we want to calculate the average income of the population of the room. Looking at the wages of each teacher, we determine that the average annual income is $45,000. Now, let's say Bill Gates walks into the room. His annual income is $3,710,000,000. We recalculate the average salary in the room and find that the mean annual income is now $36,777,230. Do you believe that calculating the average income gives us a truly representative, accurate look at this data?

Of course not. That's why using average is NOT a fair and accurate practice when there are students who are near the ceiling of the test score and/or there are outliers not representative of the overall student growth. There is only so far that students can go up when they start near the top, but their ability to drop in score is out of proportion to what they can gain. We simply can't use this as an accurate measure of student growth, and most certainly, we can't use this as a measure for a district's or a teacher's accountability.

VAM: Size matters

What if I tell you that two different teachers can get the exact same growth scores on the exact same test and have completely different Value Added scores? Possible? As it turns out, yes. Fair? I'll let you decide.

As a Value Added Leader (VAL) and educational consultant in Ohio, I have the opportunity to work with many teachers in several districts to look at their value added teacher level reports. In case you have missed the news, Ohio determines educator effectiveness by measuring how much students in teachers' classrooms "grow" on mandated standardized tests. Simply put, student scores are placed on a bell curve and then compared with where they place on the bell curve the following year. Those changes in placement on the bell curve (Normal Curve Equivalent scores,or NCE scores) are averaged across a classroom to get a mean NCE gain. In order to be "most effective," a teacher's students must have a mean NCE change of at least 2 standard errors above the mean growth score.

That's a lot of math talk, I know, but let me explain a little about "standard error" to those not familiar with statistical math concepts. Standard error is basically the confidence I have in the data. If I have a LOT of data, my standard error is small, since I have more confidence in the data. When I have fewer data points, I'm not so confident and so the standard error is larger. There are a couple of factors that have direct impact on the size of the standard error - size of the population and range of the scores.

To put this in terms of a classroom teacher's rating, teachers in middle schools typically have 120 or so students and elementary teachers have maybe 25. Special education teachers or gifted intervention specialists have even fewer, and if two or more teachers work with the same students, their numbers are decreased even further since the students are "linked" to all of the teachers who contribute to instruction. The teachers with more students will have a small standard error and the ones with fewer students have a large standard error. So what? The problem is that this becomes a big deal when determining a teacher's effectiveness rating. Let's look at an example that I encountered at a school just yesterday.

Two middle school math teachers, one general ed and one special ed, co-teach a class of sixth grade math. We will call them Mrs. A and Mrs. B. They did an outstanding job, and their students, all low-performing students in the past, did quite well on standardized tests. Their mean NCE change was about 5. Another teacher in the same building, Mr. C, teaches three classes of the same subject each day, and had similar results - mean NCE gain of 5. In other words, their students grew the same amount on their standardized tests - the teachers all produced equal "growth" in terms of how our legislature defines growth. The standard error of Mrs. A and Mrs. B was 4.9. Mr. C with similar results has around 70 students and a standard error of 1.9. Remember, more students, more confidence in the data. Same growth, different standard errors because of a difference in the size of the teachers' classes. Mrs. A and Mrs. C have the smaller class and they both "link" to all of their students and so they each get credit for only 50% of their students' results.

In Ohio, the "most effective" teachers are those with a gain index of 2.0, that is their mean NCE change is 2 or more standard errors above the mean. Teachers with a gain index of 1-2 are "above average". These are the teachers whose mean student gain is between 1 and 2 standard errors above the mean. "Average" teachers are plus or minus 1 standard error from the mean. Teachers between 1-2 standard errors below the mean are "approaching average" and those with mean student change in NCE scores of more than 2 standard errors are "least effective." It's all about the standard error, but, as I've explained, standard error depends on the size of a teachers' classroom.

The exact same growth in student achievement resulted in a gain of over two standard errors for Mr. C in our example above, and so he is lauded as one of the "most effective" teachers in the state. Mrs. A and Mrs. B, however, with the exact same gain, but a standard error of 4.9 are average. Same results - same gain in student achievement but the teachers are evaluated very differently because of the size of their classes.

If this sounds unfair to you, you would be correct. There are MANY problems with using standardized test results and norm-referenced testing for accountability that I addressed before here , here, and here. But for teachers looking at "average" ratings, this problem is significant. My effectiveness should not be determined by the size of my class.