All Strength of Schedule calculations I have seen have some major flaw(s).
Depending on the calculation, some common recurring faults:
1. They don't count FCS teams correctly
Most just don't count the FCS team, and rescales the other games. This makes the FCS worth the same as the average FBS opponent they play - clearly not the case.
An FCS team should be valued the same as their record against FBS opponents with their remaining 12 games considered losses (a likely outcome for an FCS team that played 12 FBS opponents). The result is better than a loss, but not much better.
2. They penalize teams for winning, reward teams for losing
If the opponents games against the team in question is counted, it penalizes teams for winning and rewards teams for losing.
This is apparent in rankings where the top teams have lousy records - this is because they are giving their opponents lots of additional wins compared to the teams that have better W-L records.
An SOS calculation should be a measurement of a team's opponents, not how they performed in that schedule (the latter would be a computer poll). SOS calculations shouldn't count the head to head result of the team in question.
3. They don't count teams they lose to
By incorporating the team's record, the SOS transitions to a computer poll under an SOS name.
4. They penalize teams that play more conference games
Once teams enter their conference play, their max number of points is nearly sealed.
At the current extremes, a 10 team conference playing 9 conference games starts out 10-10 compared to an 8 team conference playing 7 games. The difference can be made up, but starting 1 group 10-10 before the season starts is a flaw.
This is a downside that cannot be resolved without exasperating other problems.
5. Scaling for an unequal number of games
Continue adding points and you penalize teams from smaller conferences that don't play a title game or a road game against Hawaii.
Scale all to a 12 game schedule and you discount the extra game against a top opponent.
If used properly, there isn't a reason not to count the 13th game without scaling - the arising problem is the misapplication of the SOS by the fans, something that shouldn't factor into the calculation itself.
In the opposite case where a team plays fewer games, it gets a bit easier. If a game is canceled because of weather, etc. it should still count in the SOS (as noted above, the SOS shouldn't count the team's performance against an opponent, so what the result would have been is irrelevant in the calculation).
6. They don't account for the venue
There is an advantage to playing at home. There is an advantage to playing in large venues. Playing in front of a loud home fan base is an advantage. Few neutral fields are truly neutral.
You could never fully account for the venue without a DB meter at every game, but you can make an adjustment based on historic results.
7. Some don't have a sufficient number of layers
By layer I mean who a team plays, who their opponents beat, who their opponents' opponents beat, etc. The more layers, the more accurate - 3 should be the minimum, not the standard.
8. Improper discounts per layer
Most make each layer worth less and less as it should - the more games, the greater than chance of an upset (an upset win by a team you played should be just an upset win by another team, not a boost in your schedule strength). Most use multipliers around 1/2 for each level (1, 1/2, 1/4, etc..) or no multiplier at all, but these are far higher than the chance of an upset. Consider that 2 nearly equal teams will have close to 50/50 chance of winning - the overall average is over 70% chance the better team wins.
This would have the multiplier by layer at around 1.0, 0.7, 0.5, 0.35,0.25, etc.
Even if you created an accurate SOS measurement, what would you really have? It isn't a reflection of how good a team is, just a reflection of how good their opponents are. It can bring into question a team's performance, but it can't prove their performance - most fans use it in the latter respect reflecting an inability to differentiate between an SOS calculation and a computer poll.
To show which team is better, SOS has to be coupled to the W-L record creating a computer poll. When you incorporate an SOS calculation into a computer poll, it becomes necessary to count a team's record against their schedule (2 and 3 above) and scale by the number of teams played (5 above).
The BCS has this aspect about right - the computer polls (most of which use an SOS calculation as their basis) are a substantial 1/3 of the final ranking, but they aren't the majority. They serve as a type of reality check for the pollsters - a team doesn't have to be better than the next lower to hold their BCS poll position, they just can't be too far behind.
If I were to improve the current BCS Calc, I would suggest a scaling rather than a hard number for the computer polls.
The voting polls are scaled by the number of votes, not a hard 1-2-3-etc. The computer polls have the ability to do this given their inherent numeric calculation. This would reflect a better team to team difference rather than a 1-2-3-etc. ranking.
You could also improve the BCS computer poll calculation by averaging the most accurate polls over a 4 or 5 year sliding window and only using the most accurate polls. The accuracy would be measured by their ability to predict bowl results after the regular season is over (the time frame of the most significance to the BCS). The accuracy of the result could be improved by making the number of polls used variable, using however many are needed to yield the best prediction over time.